Semantic recall enables agents to retrieve contextually relevant messages from conversation history using vector embeddings and similarity search. This provides long-term memory beyond recent message limits.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/mastra-ai/mastra/llms.txt
Use this file to discover all available pages before exploring further.
How It Works
The SemanticRecall processor operates as both an input and output processor:- On Input: Performs semantic search on historical messages and adds relevant context
- On Output: Creates embeddings for new messages to enable future semantic search
Basic Configuration
Enable semantic recall with vector storage and an embedder:Configuration Options
Enable semantic recall with defaults (
true) or configure with detailed optionsNumber of most similar messages to retrieve from the vector database
Amount of surrounding context to include with each retrieved message
Scope of semantic search:
thread: Search only within the current conversation threadresource: Search across all threads owned by the user/resource
Minimum similarity score (0-1). Messages below this threshold are filtered out.
Vector index configuration (PostgreSQL-specific). See index optimization below.
Configuration Examples
Simple Setup
Advanced Configuration
Thread-Scoped Recall
Vector Store Setup
Semantic recall requires a vector database. Mastra supports multiple providers:Embedder Configuration
Choose an embedding model compatible with your use case:Index Optimization
For PostgreSQL with pgvector, you can optimize semantic recall performance with index configuration:Index Types:
hnsw: Best performance for most cases (recommended)ivfflat: Good balance of speed and recallflat: Exact nearest neighbor (slow but 100% recall)
Cross-Thread Recall
When usingscope: 'resource', semantic recall can retrieve messages from other threads:
Embedding Cache
SemanticRecall uses a global embedding cache to avoid redundant API calls:The cache uses xxhash for fast key generation and includes the index name to ensure isolation between different embedding models/dimensions.
Implementation Details
The SemanticRecall processor handles semantic search and embedding creation:Best Practices
Choose the Right Scope
Use
resource scope for cross-conversation context, thread scope for session-specific recall.Tune TopK
Start with 3-5 similar messages. More results increase context but also token usage.
Set a Threshold
Filter low-quality matches with a similarity threshold (e.g., 0.7).
Optimize Indexes
Use HNSW indexes for PostgreSQL to improve query performance.
Troubleshooting
No results returned
No results returned
- Check that embeddings were created (verify vector store has data)
- Lower the
thresholdvalue if set - Ensure
scopematches your use case (thread vs resource) - Verify embedder dimensions match vector store index
Slow query performance
Slow query performance
- Use HNSW index type for PostgreSQL
- Reduce
topKvalue - Check vector store connection and query performance
- Consider using a smaller embedding model
High token usage
High token usage
- Reduce
topK(fewer messages retrieved) - Reduce
messageRange(less surrounding context) - Increase
threshold(only highly relevant matches) - Balance with
lastMessagesto avoid redundancy
Next Steps
Working Memory
Store structured user information across conversations
RAG Overview
Learn about document-based RAG in Mastra
Conversation History
Manage recent message persistence