When you upload a document to AnythingLLM, the collector splits it into overlapping text chunks and passes each chunk through an embedding model, which converts the text into a high-dimensional numeric vector. Those vectors are stored in your chosen vector database and later used to find the most semantically relevant chunks for each chat query. The quality, dimensionality, and context-window size of the embedding model therefore directly affect retrieval accuracy — choosing the right embedder for your use-case is as important as choosing the right LLM.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Mintplex-Labs/anything-llm/llms.txt
Use this file to discover all available pages before exploring further.
Key Environment Variables
Selects which embedding engine to use. Valid values:
native, openai, azure, ollama, lmstudio, localai, cohere, voyageai, gemini, mistral, openrouter, lemonade, litellm, generic-openai.The specific model identifier to use for embeddings. The correct value depends on the engine — see each engine section below for examples.
Maximum character length of a single chunk passed to the embedder. Tune this to stay within the model’s token budget. Typical values range from
1000 (conservative) to 8192 (most modern models).Base URL for self-hosted embedding engines. Used by Ollama, LM Studio, LocalAI, Lemonade, LiteLLM, and Generic OpenAI.
Supported Embedding Engines
Native (built-in — no API key needed)
Native (built-in — no API key needed)
The native embedder bundles a small transformer model directly inside AnythingLLM using the Xenova Transformers.js runtime. It runs entirely in-process, requires no external service, and works offline out of the box.
OpenAI
OpenAI
Uses the OpenAI Embeddings API. Delivers excellent retrieval quality and supports the latest generation of embedding models.Available models:
| Model | Dimensions | Notes |
|---|---|---|
text-embedding-ada-002 | 1 536 | Legacy default, widely compatible |
text-embedding-3-small | 1 536 | Faster and cheaper than ada-002 |
text-embedding-3-large | 3 072 | Highest accuracy, higher cost |
Azure OpenAI
Azure OpenAI
Uses an embedding model deployed in your Azure OpenAI resource. The model preference is your Azure deployment name, not the underlying base model name.The underlying base model must be
text-embedding-ada-002 or one of the text-embedding-3-* variants.Ollama
Ollama
Runs embedding models locally through Ollama. Pull an embedding model with Popular Ollama embedding models include
ollama pull nomic-embed-text before enabling this engine.nomic-embed-text, mxbai-embed-large, and all-minilm.LM Studio
LM Studio
Uses an embedding model loaded in LM Studio’s local server.
LocalAI
LocalAI
LocalAI can serve embedding models alongside LLMs through its OpenAI-compatible API.
Cohere
Cohere
Uses Cohere’s embedding API. Cohere offers multilingual and domain-specialised embedding models.Other notable models:
embed-multilingual-v3.0, embed-english-light-v3.0.Google Gemini
Google Gemini
Uses the Google Gemini embedding API.
Mistral
Mistral
Uses Mistral’s embedding endpoint.
VoyageAI
VoyageAI
VoyageAI specialises in high-quality retrieval embeddings, with models fine-tuned for code, finance, law, and general content.Other models:
voyage-2, voyage-code-2, voyage-finance-2, voyage-law-2.OpenRouter
OpenRouter
Routes embedding requests through the OpenRouter API, giving access to multiple embedding providers under a single key.
Generic OpenAI (any compatible endpoint)
Generic OpenAI (any compatible endpoint)
Use this engine for any OpenAI-compatible embedding endpoint not covered by the named engines above — vLLM, Xinference, custom proxies, etc.
GENERIC_OPEN_AI_EMBEDDING_QUERY_PREFIX and GENERIC_OPEN_AI_EMBEDDING_PASSAGE_PREFIX let you prepend instruction strings required by some instruction-tuned embedding models (e.g., "Represent this sentence for searching relevant passages: ").LiteLLM
LiteLLM
Routes embedding requests through a locally running LiteLLM proxy, giving you access to hundreds of backends through one endpoint.
Lemonade
Lemonade
Lemonade is an AMD ROCm-optimised local inference server that also supports embedding models.
Engine Quick-Reference
EMBEDDING_ENGINE | Requires API key | Self-hosted | Recommended model |
|---|---|---|---|
native | No | Yes (in-process) | Xenova/all-MiniLM-L6-v2 |
openai | Yes | No | text-embedding-3-small |
azure | Yes | No | (your deployment name) |
ollama | No | Yes | nomic-embed-text:latest |
lmstudio | No | Yes | (loaded model) |
localai | Optional | Yes | text-embedding-ada-002 |
cohere | Yes | No | embed-english-v3.0 |
gemini | Yes | No | text-embedding-004 |
mistral | Yes | No | mistral-embed |
voyageai | Yes | No | voyage-large-2-instruct |
openrouter | Yes | No | baai/bge-m3 |
generic-openai | Optional | Either | (model-dependent) |
litellm | Optional | Yes | (model-dependent) |
lemonade | No | Yes | Qwen3-embedder |
The
EMBEDDING_ENGINE and EMBEDDING_MODEL_PREF set via environment variables take precedence over any values stored in the AnythingLLM database. Remove these variables from .env if you want to manage the embedder via the UI.