Every time AnythingLLM embeds a document chunk, it writes the resulting vector alongside the source text and metadata to a vector database. When a user sends a chat message, AnythingLLM converts the query into a vector with the same embedding model and performs a similarity search against that database to retrieve the most relevant context before calling the LLM. The vector database you choose affects storage capacity, query latency, scalability, and operational complexity — so it is worth picking the right one for your deployment size and infrastructure.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Mintplex-Labs/anything-llm/llms.txt
Use this file to discover all available pages before exploring further.
Switching from one vector database to another requires re-embedding all documents in every workspace. The vectors stored in the old database cannot be migrated automatically. Plan your database choice before ingesting large document collections.
Selecting a Vector Database
Set theVECTOR_DB environment variable to the provider identifier, then supply the connection variables for that provider. Restart AnythingLLM after any change.
Supported Vector Databases
LanceDB (default — built-in)
LanceDB (default — built-in)
LanceDB is a high-performance columnar vector database embedded directly into the AnythingLLM process. It requires no external server, no ports, and no credentials — data is written to the That is the entire configuration. LanceDB scales well for single-server deployments handling thousands of workspaces and millions of vectors.Best for: Local use, single-user deployments, development, and any scenario where operational simplicity is the priority.
STORAGE_DIR directory on disk.Chroma (self-hosted)
Chroma (self-hosted)
Chroma is an open-source vector database that you run alongside AnythingLLM, either locally or in a container.Start a local Chroma instance with:Best for: Teams wanting a self-hosted, open-source option with a familiar Python SDK and easy local setup.
Chroma Cloud
Chroma Cloud
Chroma Cloud is the managed, hosted version of Chroma.Obtain your API key and tenant/database identifiers from the Chroma Cloud dashboard.Best for: Teams that want Chroma without managing infrastructure.
Pinecone
Pinecone
Pinecone is a fully managed cloud vector database with a generous free tier and strong global availability.Create your index in the Pinecone console before connecting. Ensure the index dimension matches the output dimension of your embedding model (e.g.,
1536 for text-embedding-ada-002).Best for: Cloud-first teams that want zero infrastructure management and built-in replication.Qdrant
Qdrant
Qdrant is a high-performance, Rust-based open-source vector database available as a self-hosted binary, Docker image, or managed cloud service.Best for: High-throughput workloads, teams wanting a fast self-hosted option with rich payload filtering.
QDRANT_API_KEY is only required for Qdrant Cloud or self-hosted instances secured with an API key. For unauthenticated local instances, omit it.Start Qdrant locally:Weaviate
Weaviate
Weaviate is a cloud-native vector database with a graph-like object model, rich filtering, and built-in hybrid search.
WEAVIATE_API_KEY is required for Weaviate Cloud (WCS) instances. For unauthenticated local Docker instances, omit it.Best for: Teams that want hybrid semantic + keyword search and complex object-level metadata filtering.Milvus
Milvus
Milvus is an open-source, horizontally scalable vector database built for enterprise-scale workloads.Start a minimal Milvus cluster locally:Best for: Enterprise deployments needing horizontal scaling, high availability, and billions of vectors.
Zilliz Cloud
Zilliz Cloud
Zilliz Cloud is the fully managed cloud version of Milvus, operated by the Milvus creators.Find your cluster endpoint and generate an API token in the Zilliz Cloud console.Best for: Teams that want Milvus’s scalability without the operational overhead of self-hosting.
pgvector (PostgreSQL)
pgvector (PostgreSQL)
pgvector is a PostgreSQL extension that adds vector similarity search to an existing Postgres database. It lets you store embeddings alongside your application data with no separate vector-database process.Ensure the Best for: Teams that already operate PostgreSQL and want to avoid running a separate vector-database service.
pgvector extension is installed in the target database:Astra DB (DataStax)
Astra DB (DataStax)
Astra DB is DataStax’s serverless, cloud-native database built on Apache Cassandra with integrated vector search.Obtain your application token and endpoint from the Astra DB console.Best for: Teams using the Cassandra / DataStax ecosystem who want serverless, pay-as-you-go vector storage.
Provider Quick-Reference
VECTOR_DB value | Hosting | Required variables |
|---|---|---|
lancedb | Built-in (embedded) | (none) |
chroma | Self-hosted | CHROMA_ENDPOINT |
chromacloud | Managed (Chroma) | CHROMACLOUD_API_KEY, CHROMACLOUD_TENANT, CHROMACLOUD_DATABASE |
pinecone | Managed | PINECONE_API_KEY, PINECONE_INDEX |
qdrant | Self-hosted or managed | QDRANT_ENDPOINT |
weaviate | Self-hosted or managed | WEAVIATE_ENDPOINT |
milvus | Self-hosted | MILVUS_ADDRESS, MILVUS_USERNAME, MILVUS_PASSWORD |
zilliz | Managed (Zilliz Cloud) | ZILLIZ_ENDPOINT, ZILLIZ_API_TOKEN |
pgvector | Self-hosted (PostgreSQL) | PGVECTOR_CONNECTION_STRING |
astra | Managed (DataStax) | ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_ENDPOINT |
The
VECTOR_DB variable set via environment takes precedence over any selection stored in the AnythingLLM database. Remove it from .env if you want to manage the vector database through the Settings UI instead.