Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/mastra-ai/mastra/llms.txt

Use this file to discover all available pages before exploring further.

Semantic recall enables agents to retrieve contextually relevant messages from conversation history using vector embeddings and similarity search. This provides long-term memory beyond recent message limits.

How It Works

The SemanticRecall processor operates as both an input and output processor:
  1. On Input: Performs semantic search on historical messages and adds relevant context
  2. On Output: Creates embeddings for new messages to enable future semantic search

Basic Configuration

Enable semantic recall with vector storage and an embedder:
import { Memory } from '@mastra/core';
import { PgVector } from '@mastra/vector-pg';
import { LibSQLStore } from '@mastra/store-libsql';

const memory = new Memory({
  storage: new LibSQLStore({
    id: 'agent-memory',
    url: 'file:./memory.db'
  }),
  vector: new PgVector({
    connectionString: process.env.DATABASE_URL
  }),
  embedder: 'openai/text-embedding-3-small',
  options: {
    lastMessages: 10,
    semanticRecall: {
      topK: 5,
      messageRange: 2,
      scope: 'resource'
    }
  }
});

Configuration Options

semanticRecall
boolean | SemanticRecall
Enable semantic recall with defaults (true) or configure with detailed options
topK
number
default:"4"
Number of most similar messages to retrieve from the vector database
messageRange
number | { before: number; after: number }
default:"1"
Amount of surrounding context to include with each retrieved message
scope
'thread' | 'resource'
default:"'resource'"
Scope of semantic search:
  • thread: Search only within the current conversation thread
  • resource: Search across all threads owned by the user/resource
threshold
number
Minimum similarity score (0-1). Messages below this threshold are filtered out.
indexConfig
VectorIndexConfig
Vector index configuration (PostgreSQL-specific). See index optimization below.

Configuration Examples

Simple Setup

const memory = new Memory({
  storage,
  vector,
  embedder: 'openai/text-embedding-3-small',
  options: {
    semanticRecall: true // Enable with defaults
  }
});

Advanced Configuration

const memory = new Memory({
  storage,
  vector,
  embedder: 'openai/text-embedding-3-large',
  embedderOptions: {
    providerOptions: {
      openai: {
        dimensions: 1536 // Custom embedding dimensions
      }
    }
  },
  options: {
    lastMessages: 10,
    semanticRecall: {
      topK: 8,
      messageRange: { before: 2, after: 3 },
      scope: 'resource',
      threshold: 0.7,
      indexConfig: {
        type: 'hnsw',
        metric: 'dotproduct',
        hnsw: {
          m: 16,
          efConstruction: 64
        }
      }
    }
  }
});

Thread-Scoped Recall

const memory = new Memory({
  storage,
  vector,
  embedder: 'openai/text-embedding-3-small',
  options: {
    semanticRecall: {
      topK: 5,
      scope: 'thread' // Only search current thread
    }
  }
});

Vector Store Setup

Semantic recall requires a vector database. Mastra supports multiple providers:
import { PgVector } from '@mastra/vector-pg';

const vector = new PgVector({
  connectionString: process.env.DATABASE_URL
});

Embedder Configuration

Choose an embedding model compatible with your use case:
const memory = new Memory({
  storage,
  vector,
  embedder: 'openai/text-embedding-3-small',
  embedderOptions: {
    providerOptions: {
      openai: {
        dimensions: 1536
      }
    }
  },
  options: {
    semanticRecall: true
  }
});

Index Optimization

For PostgreSQL with pgvector, you can optimize semantic recall performance with index configuration:
const memory = new Memory({
  storage,
  vector,
  embedder: 'openai/text-embedding-3-small',
  options: {
    semanticRecall: {
      topK: 5,
      indexConfig: {
        type: 'hnsw', // Hierarchical Navigable Small World
        metric: 'dotproduct', // Best for OpenAI embeddings
        hnsw: {
          m: 16, // Links per node
          efConstruction: 64 // Construction quality
        }
      }
    }
  }
});
Index Types:
  • hnsw: Best performance for most cases (recommended)
  • ivfflat: Good balance of speed and recall
  • flat: Exact nearest neighbor (slow but 100% recall)

Cross-Thread Recall

When using scope: 'resource', semantic recall can retrieve messages from other threads:
const memory = new Memory({
  storage,
  vector,
  embedder: 'openai/text-embedding-3-small',
  options: {
    semanticRecall: {
      topK: 5,
      messageRange: 2,
      scope: 'resource' // Search across all user threads
    }
  }
});

const agent = new Agent({
  name: 'Assistant',
  model: 'openai/gpt-4o',
  memory
});

// Query references information from previous conversations
const result = await agent.generate(
  'What did I say about my dietary preferences?',
  {
    threadId: 'current-thread',
    resourceId: 'user-123'
  }
);
Cross-thread messages are formatted with timestamps:
The following messages were remembered from a different conversation:
<remembered_from_other_conversation>

the following messages are from 2024, Feb, 15
Message from previous conversation at 3:45 PM: User: I'm allergic to peanuts
Message from previous conversation at 3:46 PM: Assistant: I'll make sure to avoid peanuts in all recommendations

<end_remembered_from_other_conversation>

Embedding Cache

SemanticRecall uses a global embedding cache to avoid redundant API calls:
import { globalEmbeddingCache } from '@mastra/core/processors';

// Clear cache if needed
globalEmbeddingCache.clear();

// Check cache size
console.log(`Cache size: ${globalEmbeddingCache.size}`);
The cache uses xxhash for fast key generation and includes the index name to ensure isolation between different embedding models/dimensions.

Implementation Details

The SemanticRecall processor handles semantic search and embedding creation:
async processInput(args) {
  const { messages, messageList, requestContext } = args;
  
  // Extract user query from last user message
  const userQuery = this.extractUserQuery(messages);
  if (!userQuery) return messageList;
  
  // Generate embeddings for the query
  const { embeddings, dimension } = await this.embedMessageContent(
    userQuery,
    indexName
  );
  
  // Ensure vector index exists
  await this.ensureVectorIndex(indexName, dimension);
  
  // Perform vector search
  const results = await this.vector.query({
    indexName,
    queryVector: embeddings[0],
    topK: this.topK,
    filter: this.scope === 'resource' 
      ? { resource_id: resourceId } 
      : { thread_id: threadId }
  });
  
  // Retrieve messages with context
  const similarMessages = await this.storage.listMessages({
    threadId,
    resourceId,
    include: results.map(r => ({
      id: r.metadata?.message_id,
      threadId: r.metadata?.thread_id,
      withNextMessages: this.messageRange.after,
      withPreviousMessages: this.messageRange.before
    }))
  });
  
  // Add to message list
  messageList.add(similarMessages, 'memory');
  return messageList;
}

Best Practices

Choose the Right Scope

Use resource scope for cross-conversation context, thread scope for session-specific recall.

Tune TopK

Start with 3-5 similar messages. More results increase context but also token usage.

Set a Threshold

Filter low-quality matches with a similarity threshold (e.g., 0.7).

Optimize Indexes

Use HNSW indexes for PostgreSQL to improve query performance.

Troubleshooting

  • Check that embeddings were created (verify vector store has data)
  • Lower the threshold value if set
  • Ensure scope matches your use case (thread vs resource)
  • Verify embedder dimensions match vector store index
  • Use HNSW index type for PostgreSQL
  • Reduce topK value
  • Check vector store connection and query performance
  • Consider using a smaller embedding model
  • Reduce topK (fewer messages retrieved)
  • Reduce messageRange (less surrounding context)
  • Increase threshold (only highly relevant matches)
  • Balance with lastMessages to avoid redundancy

Next Steps

Working Memory

Store structured user information across conversations

RAG Overview

Learn about document-based RAG in Mastra

Conversation History

Manage recent message persistence

Build docs developers (and LLMs) love