Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Mintplex-Labs/anything-llm/llms.txt

Use this file to discover all available pages before exploring further.

When you upload a document to AnythingLLM, the collector splits it into overlapping text chunks and passes each chunk through an embedding model, which converts the text into a high-dimensional numeric vector. Those vectors are stored in your chosen vector database and later used to find the most semantically relevant chunks for each chat query. The quality, dimensionality, and context-window size of the embedding model therefore directly affect retrieval accuracy — choosing the right embedder for your use-case is as important as choosing the right LLM.
Changing your embedding model after you have already embedded documents requires you to re-embed every workspace. The vectors produced by one model are incompatible with those from another. AnythingLLM will warn you in the UI when this situation is detected. Plan your embedder choice before ingesting large document collections.

Key Environment Variables

EMBEDDING_ENGINE
string
default:"native"
Selects which embedding engine to use. Valid values: native, openai, azure, ollama, lmstudio, localai, cohere, voyageai, gemini, mistral, openrouter, lemonade, litellm, generic-openai.
EMBEDDING_MODEL_PREF
string
The specific model identifier to use for embeddings. The correct value depends on the engine — see each engine section below for examples.
EMBEDDING_MODEL_MAX_CHUNK_LENGTH
number
Maximum character length of a single chunk passed to the embedder. Tune this to stay within the model’s token budget. Typical values range from 1000 (conservative) to 8192 (most modern models).
EMBEDDING_BASE_PATH
string
Base URL for self-hosted embedding engines. Used by Ollama, LM Studio, LocalAI, Lemonade, LiteLLM, and Generic OpenAI.

Supported Embedding Engines

The native embedder bundles a small transformer model directly inside AnythingLLM using the Xenova Transformers.js runtime. It runs entirely in-process, requires no external service, and works offline out of the box.
EMBEDDING_ENGINE='native'
EMBEDDING_MODEL_PREF='Xenova/all-MiniLM-L6-v2'
The native embedder is the best choice for local-first, single-user, or air-gapped deployments where you want zero external dependencies. Xenova/all-MiniLM-L6-v2 is a fast, high-quality 384-dimensional model suitable for most document retrieval tasks.
Uses the OpenAI Embeddings API. Delivers excellent retrieval quality and supports the latest generation of embedding models.
EMBEDDING_ENGINE='openai'
OPEN_AI_KEY=sk-...
EMBEDDING_MODEL_PREF='text-embedding-ada-002'
Available models:
ModelDimensionsNotes
text-embedding-ada-0021 536Legacy default, widely compatible
text-embedding-3-small1 536Faster and cheaper than ada-002
text-embedding-3-large3 072Highest accuracy, higher cost
For new cloud deployments, text-embedding-3-small offers the best balance of cost, speed, and retrieval quality.
Uses an embedding model deployed in your Azure OpenAI resource. The model preference is your Azure deployment name, not the underlying base model name.
EMBEDDING_ENGINE='azure'
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_KEY=...
EMBEDDING_MODEL_PREF='my-embedder-model'
The underlying base model must be text-embedding-ada-002 or one of the text-embedding-3-* variants.
Runs embedding models locally through Ollama. Pull an embedding model with ollama pull nomic-embed-text before enabling this engine.
EMBEDDING_ENGINE='ollama'
EMBEDDING_BASE_PATH='http://host.docker.internal:11434'
EMBEDDING_MODEL_PREF='nomic-embed-text:latest'
EMBEDDING_MODEL_MAX_CHUNK_LENGTH=8192
Popular Ollama embedding models include nomic-embed-text, mxbai-embed-large, and all-minilm.
Uses an embedding model loaded in LM Studio’s local server.
EMBEDDING_ENGINE='lmstudio'
EMBEDDING_BASE_PATH='https://host.docker.internal:1234/v1'
EMBEDDING_MODEL_PREF='nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q4_0.gguf'
EMBEDDING_MODEL_MAX_CHUNK_LENGTH=8192
LocalAI can serve embedding models alongside LLMs through its OpenAI-compatible API.
EMBEDDING_ENGINE='localai'
EMBEDDING_BASE_PATH='http://localhost:8080/v1'
EMBEDDING_MODEL_PREF='text-embedding-ada-002'
EMBEDDING_MODEL_MAX_CHUNK_LENGTH=1000
Uses Cohere’s embedding API. Cohere offers multilingual and domain-specialised embedding models.
EMBEDDING_ENGINE='cohere'
COHERE_API_KEY=...
EMBEDDING_MODEL_PREF='embed-english-v3.0'
Other notable models: embed-multilingual-v3.0, embed-english-light-v3.0.
Uses the Google Gemini embedding API.
EMBEDDING_ENGINE='gemini'
GEMINI_EMBEDDING_API_KEY=...
EMBEDDING_MODEL_PREF='text-embedding-004'
Uses Mistral’s embedding endpoint.
EMBEDDING_ENGINE='mistral'
MISTRAL_API_KEY=...
EMBEDDING_MODEL_PREF='mistral-embed'
VoyageAI specialises in high-quality retrieval embeddings, with models fine-tuned for code, finance, law, and general content.
EMBEDDING_ENGINE='voyageai'
VOYAGEAI_API_KEY=...
EMBEDDING_MODEL_PREF='voyage-large-2-instruct'
Other models: voyage-2, voyage-code-2, voyage-finance-2, voyage-law-2.
Routes embedding requests through the OpenRouter API, giving access to multiple embedding providers under a single key.
EMBEDDING_ENGINE='openrouter'
OPENROUTER_API_KEY='...'
EMBEDDING_MODEL_PREF='baai/bge-m3'
Use this engine for any OpenAI-compatible embedding endpoint not covered by the named engines above — vLLM, Xinference, custom proxies, etc.
EMBEDDING_ENGINE='generic-openai'
EMBEDDING_BASE_PATH='http://127.0.0.1:4000'
EMBEDDING_MODEL_PREF='text-embedding-ada-002'
EMBEDDING_MODEL_MAX_CHUNK_LENGTH=8192
GENERIC_OPEN_AI_EMBEDDING_API_KEY='sk-123abc'
# Optional tuning:
# GENERIC_OPEN_AI_EMBEDDING_MAX_CONCURRENT_CHUNKS=500
# GENERIC_OPEN_AI_EMBEDDING_API_DELAY_MS=1000
# GENERIC_OPEN_AI_EMBEDDING_QUERY_PREFIX=
# GENERIC_OPEN_AI_EMBEDDING_PASSAGE_PREFIX=
GENERIC_OPEN_AI_EMBEDDING_QUERY_PREFIX and GENERIC_OPEN_AI_EMBEDDING_PASSAGE_PREFIX let you prepend instruction strings required by some instruction-tuned embedding models (e.g., "Represent this sentence for searching relevant passages: ").
Routes embedding requests through a locally running LiteLLM proxy, giving you access to hundreds of backends through one endpoint.
EMBEDDING_ENGINE='litellm'
LITE_LLM_BASE_PATH='http://127.0.0.1:4000'
LITE_LLM_API_KEY='sk-123abc'
EMBEDDING_MODEL_PREF='text-embedding-ada-002'
EMBEDDING_MODEL_MAX_CHUNK_LENGTH=8192
Lemonade is an AMD ROCm-optimised local inference server that also supports embedding models.
EMBEDDING_ENGINE='lemonade'
EMBEDDING_BASE_PATH='http://127.0.0.1:8000'
EMBEDDING_MODEL_PREF='Qwen3-embedder'
EMBEDDING_MODEL_MAX_CHUNK_LENGTH=8192

Engine Quick-Reference

EMBEDDING_ENGINERequires API keySelf-hostedRecommended model
nativeNoYes (in-process)Xenova/all-MiniLM-L6-v2
openaiYesNotext-embedding-3-small
azureYesNo(your deployment name)
ollamaNoYesnomic-embed-text:latest
lmstudioNoYes(loaded model)
localaiOptionalYestext-embedding-ada-002
cohereYesNoembed-english-v3.0
geminiYesNotext-embedding-004
mistralYesNomistral-embed
voyageaiYesNovoyage-large-2-instruct
openrouterYesNobaai/bge-m3
generic-openaiOptionalEither(model-dependent)
litellmOptionalYes(model-dependent)
lemonadeNoYesQwen3-embedder
The EMBEDDING_ENGINE and EMBEDDING_MODEL_PREF set via environment variables take precedence over any values stored in the AnythingLLM database. Remove these variables from .env if you want to manage the embedder via the UI.

Build docs developers (and LLMs) love