Documentation Index
Fetch the complete documentation index at: https://mintlify.com/microsoft/typeagent-py/llms.txt
Use this file to discover all available pages before exploring further.
TypeAgent provides comprehensive email integration for ingesting and querying email conversations from multiple sources.
Email Workflow Overview
Working with emails follows a three-step workflow:
Fetch raw .eml files from your email provider
Parse and index emails into a TypeAgent database
Ask natural language questions about your email content
TypeAgent represents emails using the EmailMessage class:
from typeagent.emails.email_message import EmailMessage, EmailMessageMeta
message = EmailMessage(
text_chunks=["Subject: Project Update", "The project is on track..."],
metadata=EmailMessageMeta(
sender="alice@example.com",
recipients=["bob@example.com", "carol@example.com"],
cc=["dave@example.com"],
subject="Project Update",
id="<message-id@example.com>"
),
timestamp="2024-01-15T10:30:00Z",
src_url="/path/to/email.eml"
)
Email metadata includes:
- sender: From address
- recipients: To addresses (list)
- cc: CC addresses (list)
- bcc: BCC addresses (list, if available)
- subject: Email subject line
- id: Message-ID header
- timestamp: ISO 8601 timestamp
- src_url: Source file path or identifier
Importing Email Files
TypeAgent provides utilities for importing .eml files:
from typeagent.emails.email_import import import_email_from_file
# Import single email
email = import_email_from_file("message.eml")
print(f"From: {email.metadata.sender}")
print(f"To: {', '.join(email.metadata.recipients)}")
print(f"Subject: {email.metadata.subject}")
print(f"Body chunks: {len(email.text_chunks)}")
Import from Directory
from typeagent.emails.email_import import import_emails_from_dir
# Import all .eml files from directory
for email in import_emails_from_dir("inbox_dump"):
print(f"Imported: {email.metadata.subject}")
Import from String
from typeagent.emails.email_import import import_email_string
# Import from MIME string
with open("message.eml", "r") as f:
mime_string = f.read()
email = import_email_string(mime_string)
The ingest_email.py tool provides a complete email ingestion pipeline:
# Basic ingestion
python tools/ingest_email.py -d emails.db inbox_dump/
# Ingest specific files
python tools/ingest_email.py -d emails.db msg1.eml msg2.eml
# Verbose output
python tools/ingest_email.py -d emails.db inbox_dump/ --verbose
Date Filtering
Filter emails by date range:
# Ingest only January 2024 emails
python tools/ingest_email.py -d emails.db inbox_dump/ \
--start-date 2024-01-01 \
--stop-date 2024-02-01
# Date range is [start, stop) - start inclusive, stop exclusive
Process emails in batches:
# Ingest first 20 emails
python tools/ingest_email.py -d emails.db inbox_dump/ --limit 20
# Skip first 100, process next 50
python tools/ingest_email.py -d emails.db inbox_dump/ \
--offset 100 \
--limit 50
Filter Pipeline
The ingestion tool applies filters in this order:
Slice the input file list: files[offset:offset+limit]
Skip emails that were previously ingested
Filter by --start-date and --stop-date
Programmatic Email Ingestion
Create a custom ingestion pipeline:
import asyncio
from pathlib import Path
from dotenv import load_dotenv
from typeagent.emails.email_import import import_email_from_file
from typeagent.emails.email_memory import EmailMemory
from typeagent.emails.email_message import EmailMessage
from typeagent.knowpro.convsettings import ConversationSettings
from typeagent.storage.utils import create_storage_provider
load_dotenv()
async def ingest_emails():
# Create settings
settings = ConversationSettings()
# Create storage provider
settings.storage_provider = await create_storage_provider(
settings.message_text_index_settings,
settings.related_term_index_settings,
"emails.db",
EmailMessage
)
# Create email memory
email_memory = await EmailMemory.create(settings)
# Process email files
email_dir = Path("inbox_dump")
for email_file in email_dir.glob("*.eml"):
source_id = str(email_file)
# Skip if already ingested
if await settings.storage_provider.is_source_ingested(source_id):
print(f"Skipping {email_file.name} (already ingested)")
continue
try:
# Import and ingest
email = import_email_from_file(str(email_file))
await email_memory.add_messages_with_indexing(
[email],
source_ids=[source_id]
)
print(f"Ingested {email_file.name}")
except Exception as e:
print(f"Failed to ingest {email_file.name}: {e}")
# Mark as failed
async with settings.storage_provider:
await settings.storage_provider.mark_source_ingested(
source_id,
status=e.__class__.__name__
)
if __name__ == "__main__":
asyncio.run(ingest_emails())
Downloading Emails
TypeAgent includes tools for downloading emails from various sources:
Download emails using the Gmail API:# Download 50 most recent emails (default)
cd tools/mail
python gmail_dump.py
# Download 200 emails
python gmail_dump.py --max-results 200
# Output to specific directory
python gmail_dump.py --output-dir ~/gmail_export
Gmail API Setup
Navigate to “Credentials” in sidebar
Click ”+ Create Credentials”
Select “OAuth client ID”
Choose “Desktop app”
Download JSON credentials
Save credentials as tools/mail/client_secret.json
Run gmail_dump.py
Complete OAuth flow in browser
Token saved to tools/mail/token.json
The Gmail API token expires after about a week. Delete token.json to trigger re-authentication.
Download emails using Microsoft Graph API:# Download from Outlook
cd tools/mail
python outlook_dump.py
# Download specific number of emails
python outlook_dump.py --max-results 100
Microsoft Graph Setup
Go to Azure Portal
Navigate to “App registrations”
Register new application
Add “Mail.Read” permission
Grant admin consent
Create client secret
Set environment variables:
OUTLOOK_CLIENT_ID
OUTLOOK_CLIENT_SECRET
OUTLOOK_TENANT_ID
Run outlook_dump.py
Extract emails from mbox archives:# Extract from local mbox file
cd tools/mail
python mbox_dump.py archive.mbox
# Extract to specific directory
python mbox_dump.py archive.mbox --output-dir ~/mbox_export
Mbox files are commonly exported from:
- Thunderbird
- Apple Mail
- Gmail Takeout
- Many email servers
Email Features
Reply Detection
TypeAgent automatically detects and extracts only the latest response from email threads:
from typeagent.emails.email_import import is_reply, get_last_response_in_thread
# Check if email is a reply
if is_reply(email_message):
# Extract only the new content
body = get_last_response_in_thread(body_text)
Forward Detection
from typeagent.emails.email_import import is_forwarded, get_forwarded_email_parts
# Check if email is forwarded
if is_forwarded(email_message):
# Split into parts
parts = get_forwarded_email_parts(email_text)
Encoding Handling
TypeAgent properly handles RFC 2047 encoded words:
from typeagent.emails.email_import import decode_encoded_words
# Decode encoded headers
subject = decode_encoded_words("=?UTF-8?B?SGVsbG8gV29ybGQ=?=")
print(subject) # "Hello World"
Querying Emails
Once ingested, query emails using natural language:
# Interactive query
python tools/query.py --database emails.db
# Single query
python tools/query.py --database emails.db \
--query "What emails did Alice send about the project?"
Email Query Examples
from typeagent import create_conversation
from typeagent.emails.email_message import EmailMessage
conversation = await create_conversation("emails.db", EmailMessage)
# Who questions
answer = await conversation.query("Who sent emails about the meeting?")
answer = await conversation.query("Who did Alice email yesterday?")
# What questions
answer = await conversation.query("What was discussed in the project emails?")
answer = await conversation.query("What action items were mentioned?")
# When questions
answer = await conversation.query("When was the deadline mentioned?")
answer = await conversation.query("What emails were sent last week?")
# Topic searches
answer = await conversation.query("Find emails about budget approval")
answer = await conversation.query("Show me emails related to deployment")
Emails are automatically enriched with semantic knowledge:
# EmailMessage.metadata.get_knowledge() extracts:
# - Entities: People (sender, recipients), email addresses
# - Actions: "sent email", "received email"
# - Topics: Subject line
# - Relationships: sender -> recipient connections
knowledge = email.metadata.get_knowledge()
print(f"Entities: {len(knowledge.entities)}")
print(f"Actions: {len(knowledge.actions)}")
print(f"Topics: {knowledge.topics}")
Email addresses are parsed into entities:
# "Alice Smith <alice@example.com>" becomes:
# - Entity: "Alice Smith" (type: person)
# - Facet: email_address = alice@example.com
# - Entity: "alice@example.com" (type: email_address, alias)
Email actions capture communication:
# For email from alice@example.com to bob@example.com:
# - Action: "Alice Smith" sent email to "Bob Jones"
# - Action: "alice@example.com" sent email to "bob@example.com"
# - Action: "Bob Jones" received email from "Alice Smith"
# - Action: "bob@example.com" received email from "alice@example.com"
Email ingestion can take 1-2 seconds per message due to LLM-based knowledge extraction.
Batch Size Configuration
from typeagent.knowpro.convsettings import ConversationSettings
settings = ConversationSettings()
# Adjust concurrent extraction (default: 4)
settings.semantic_ref_index_settings.batch_size = 4
Progress Monitoring
import time
start_time = time.time()
success_count = 0
batch_size = 4
for i, email in enumerate(emails):
await email_memory.add_messages_with_indexing([email])
success_count += 1
# Print progress periodically
if (success_count % batch_size) == 0:
elapsed = time.time() - start_time
semref_count = await semref_collection.size()
print(f"{success_count} imported | "
f"{semref_count} semrefs | "
f"{elapsed:.1f}s elapsed")
Error Handling
Handle common email ingestion errors:
import traceback
import openai
success_count = 0
failed_count = 0
skipped_count = 0
for source_id, email_file in email_files:
try:
email = import_email_from_file(str(email_file))
# Apply date filter
if not email_matches_date_filter(
email.timestamp,
start_date,
stop_date
):
skipped_count += 1
continue
await email_memory.add_messages_with_indexing(
[email],
source_ids=[source_id]
)
success_count += 1
except openai.AuthenticationError as e:
print(f"Authentication error: {e}")
break # Fatal error
except Exception as e:
failed_count += 1
print(f"Error processing {source_id}: {e}")
# Mark as failed
async with storage_provider:
await storage_provider.mark_source_ingested(
source_id,
status=e.__class__.__name__
)
if verbose:
traceback.print_exc()
print(f"\nSuccessfully imported {success_count} emails")
print(f"Skipped {skipped_count} emails (date filter)")
print(f"Failed to import {failed_count} emails")
Example: Complete Email Pipeline
Here’s a complete example from download to query:
#!/bin/bash
# complete_email_pipeline.sh
set -e # Exit on error
# 1. Download emails from Gmail
echo "Downloading emails..."
cd tools/mail
python gmail_dump.py --max-results 100 --output-dir ../../email_dump
cd ../..
# 2. Ingest emails into database
echo "Ingesting emails..."
python tools/ingest_email.py \
-d emails.db \
email_dump/ \
--start-date 2024-01-01 \
--verbose
# 3. Query the database
echo "Database ready for queries!"
echo "Run: python tools/query.py --database emails.db"
Next Steps