Skip to main content
Embeddings enable semantic search in EchoVault. Without embeddings, you still get fast keyword search via FTS5.

Why Embeddings?

Embeddings convert memories into vectors, allowing semantic search:
  • Keyword search: “JWT authentication” only matches exact keywords
  • Semantic search: “JWT authentication” also finds “Bearer token auth”, “stateless API auth”, etc.
EchoVault uses hybrid search — combining FTS5 keywords with semantic vectors for best results.

Supported Providers

Ollama

Local, free, private

OpenAI

Cloud API, paid

vLLM

Self-hosted, OpenAI-compatible

Ollama (Local)

Run embeddings locally with Ollama. No API keys, no cloud, no cost.

Setup

  1. Install Ollama: https://ollama.ai/download
  2. Pull an embedding model:
ollama pull nomic-embed-text
  1. Configure EchoVault:
memory config init
  1. Edit ~/.memory/config.yaml:
embedding:
  provider: ollama
  model: nomic-embed-text
  base_url: http://localhost:11434

Configuration Options

FieldTypeRequiredDefaultDescription
providerstringYesollamaMust be ollama
modelstringYesnomic-embed-textOllama model name
base_urlstringNohttp://localhost:11434Ollama API endpoint
If Ollama is running on a different host or port, set base_url accordingly.
ModelSizeDimensionsUse Case
nomic-embed-text274 MB768General-purpose, fast
mxbai-embed-large669 MB1024High accuracy
all-minilm46 MB384Lightweight, quick

Example: Custom Ollama Host

embedding:
  provider: ollama
  model: nomic-embed-text
  base_url: http://ollama.internal:11434

OpenAI (Cloud)

Use OpenAI’s cloud API for embeddings. Requires an API key.

Setup

  1. Get an API key: https://platform.openai.com/api-keys
  2. Configure EchoVault:
memory config init
  1. Edit ~/.memory/config.yaml:
embedding:
  provider: openai
  model: text-embedding-3-small
  base_url: https://api.openai.com/v1
  api_key: sk-proj-...

Configuration Options

FieldTypeRequiredDefaultDescription
providerstringYesMust be openai
modelstringYestext-embedding-3-smallOpenAI model name
base_urlstringNohttps://api.openai.com/v1API endpoint
api_keystringYesOpenAI API key
OpenAI API calls send memory content to OpenAI servers. If privacy is critical, use Ollama or vLLM instead.
ModelDimensionsCost (per 1M tokens)
text-embedding-3-small1536$0.02
text-embedding-3-large3072$0.13
text-embedding-ada-0021536$0.10
text-embedding-3-small offers the best balance of performance and cost.

vLLM (Self-Hosted)

vLLM is an OpenAI-compatible inference server. Host your own embedding models on-premises.

Setup

  1. Deploy vLLM with an embedding model
  2. Note the endpoint URL (typically http://your-host:8000/v1)
  3. Configure EchoVault:
embedding:
  provider: openai
  model: BAAI/bge-small-en-v1.5
  base_url: http://vllm.internal:8000/v1
  # api_key: optional-auth-token
Use provider: openai for vLLM since it implements the OpenAI embeddings API.

Configuration Options

FieldTypeRequiredDescription
providerstringYesSet to openai
modelstringYesModel name exposed by your vLLM instance
base_urlstringYesvLLM endpoint (e.g., http://host:8000/v1)
api_keystringNoAuth token if your vLLM gateway requires it

Example: On-Premises vLLM

embedding:
  provider: openai
  model: intfloat/e5-large-v2
  base_url: http://vllm.company.internal:8000/v1
  api_key: vllm-auth-token-123

Verify Configuration

After editing config.yaml, verify your setup:
memory config
Output should show:
embedding:
  provider: ollama
  model: nomic-embed-text
  base_url: http://localhost:11434
  api_key: null
context:
  semantic: auto
  topup_recent: true
memory_home: /Users/username/.memory
memory_home_source: default
API keys are automatically redacted in memory config output.

Reindex After Changing Providers

If you change embedding providers or models, rebuild the vector index:
memory reindex
This re-embeds all existing memories with the new provider.
Reindexing sends all memories to the new provider. If switching from local (Ollama) to cloud (OpenAI), be aware that memory content will be sent to OpenAI’s API.

Testing Embeddings

Save a test memory and search for it:
memory save \
  --title "Test semantic search" \
  --what "Testing vector embeddings" \
  --tags "test" \
  --category "context"

memory search "vector search"
If semantic search is working, the memory should be found even though “vector search” doesn’t exactly match “vector embeddings”.

Troubleshooting

Ollama Not Responding

Error: Connection refused or timeout Solution:
  1. Check if Ollama is running: ollama list
  2. Verify the port: curl http://localhost:11434/api/ps
  3. Update base_url in config.yaml if using a custom host/port

OpenAI Authentication Failed

Error: 401 Unauthorized Solution:
  1. Verify API key is correct
  2. Check key has not expired
  3. Ensure base_url is https://api.openai.com/v1

Model Not Found

Error: model not found or 404 Solution:
  • Ollama: Pull the model: ollama pull nomic-embed-text
  • OpenAI: Verify model name matches OpenAI’s docs
  • vLLM: Check model name matches what vLLM is serving

Next Steps

Reindex Memories

Rebuild vectors after configuration changes

Context Configuration

Control how memories are retrieved

Build docs developers (and LLMs) love