Why Embeddings?
Embeddings convert memories into vectors, allowing semantic search:- Keyword search: “JWT authentication” only matches exact keywords
- Semantic search: “JWT authentication” also finds “Bearer token auth”, “stateless API auth”, etc.
Supported Providers
Ollama
Local, free, private
OpenAI
Cloud API, paid
vLLM
Self-hosted, OpenAI-compatible
Ollama (Local)
Run embeddings locally with Ollama. No API keys, no cloud, no cost.Setup
- Install Ollama: https://ollama.ai/download
- Pull an embedding model:
- Configure EchoVault:
- Edit
~/.memory/config.yaml:
Configuration Options
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
provider | string | Yes | ollama | Must be ollama |
model | string | Yes | nomic-embed-text | Ollama model name |
base_url | string | No | http://localhost:11434 | Ollama API endpoint |
If Ollama is running on a different host or port, set
base_url accordingly.Recommended Models
| Model | Size | Dimensions | Use Case |
|---|---|---|---|
nomic-embed-text | 274 MB | 768 | General-purpose, fast |
mxbai-embed-large | 669 MB | 1024 | High accuracy |
all-minilm | 46 MB | 384 | Lightweight, quick |
Example: Custom Ollama Host
OpenAI (Cloud)
Use OpenAI’s cloud API for embeddings. Requires an API key.Setup
- Get an API key: https://platform.openai.com/api-keys
- Configure EchoVault:
- Edit
~/.memory/config.yaml:
Configuration Options
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
provider | string | Yes | — | Must be openai |
model | string | Yes | text-embedding-3-small | OpenAI model name |
base_url | string | No | https://api.openai.com/v1 | API endpoint |
api_key | string | Yes | — | OpenAI API key |
Recommended Models
| Model | Dimensions | Cost (per 1M tokens) |
|---|---|---|
text-embedding-3-small | 1536 | $0.02 |
text-embedding-3-large | 3072 | $0.13 |
text-embedding-ada-002 | 1536 | $0.10 |
vLLM (Self-Hosted)
vLLM is an OpenAI-compatible inference server. Host your own embedding models on-premises.Setup
- Deploy vLLM with an embedding model
- Note the endpoint URL (typically
http://your-host:8000/v1) - Configure EchoVault:
Use
provider: openai for vLLM since it implements the OpenAI embeddings API.Configuration Options
| Field | Type | Required | Description |
|---|---|---|---|
provider | string | Yes | Set to openai |
model | string | Yes | Model name exposed by your vLLM instance |
base_url | string | Yes | vLLM endpoint (e.g., http://host:8000/v1) |
api_key | string | No | Auth token if your vLLM gateway requires it |
Example: On-Premises vLLM
Verify Configuration
After editingconfig.yaml, verify your setup:
Reindex After Changing Providers
If you change embedding providers or models, rebuild the vector index:Testing Embeddings
Save a test memory and search for it:Troubleshooting
Ollama Not Responding
Error:Connection refused or timeout
Solution:
- Check if Ollama is running:
ollama list - Verify the port:
curl http://localhost:11434/api/ps - Update
base_urlinconfig.yamlif using a custom host/port
OpenAI Authentication Failed
Error:401 Unauthorized
Solution:
- Verify API key is correct
- Check key has not expired
- Ensure
base_urlishttps://api.openai.com/v1
Model Not Found
Error:model not found or 404
Solution:
- Ollama: Pull the model:
ollama pull nomic-embed-text - OpenAI: Verify model name matches OpenAI’s docs
- vLLM: Check model name matches what vLLM is serving
Next Steps
Reindex Memories
Rebuild vectors after configuration changes
Context Configuration
Control how memories are retrieved