Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Mintplex-Labs/anything-llm/llms.txt

Use this file to discover all available pages before exploring further.

Every time AnythingLLM embeds a document chunk, it writes the resulting vector alongside the source text and metadata to a vector database. When a user sends a chat message, AnythingLLM converts the query into a vector with the same embedding model and performs a similarity search against that database to retrieve the most relevant context before calling the LLM. The vector database you choose affects storage capacity, query latency, scalability, and operational complexity — so it is worth picking the right one for your deployment size and infrastructure.
For local or single-user deployments, LanceDB is the default and requires zero configuration. It stores all data inside your STORAGE_DIR directory with no external process needed. Only switch to an external vector database if you need multi-node scalability, advanced filtering, or enterprise features.
Switching from one vector database to another requires re-embedding all documents in every workspace. The vectors stored in the old database cannot be migrated automatically. Plan your database choice before ingesting large document collections.

Selecting a Vector Database

Set the VECTOR_DB environment variable to the provider identifier, then supply the connection variables for that provider. Restart AnythingLLM after any change.
# Example: switch to Qdrant
VECTOR_DB="qdrant"
QDRANT_ENDPOINT="http://localhost:6333"
QDRANT_API_KEY=your-api-key

Supported Vector Databases

LanceDB is a high-performance columnar vector database embedded directly into the AnythingLLM process. It requires no external server, no ports, and no credentials — data is written to the STORAGE_DIR directory on disk.
VECTOR_DB="lancedb"
That is the entire configuration. LanceDB scales well for single-server deployments handling thousands of workspaces and millions of vectors.Best for: Local use, single-user deployments, development, and any scenario where operational simplicity is the priority.
Chroma is an open-source vector database that you run alongside AnythingLLM, either locally or in a container.
VECTOR_DB="chroma"
CHROMA_ENDPOINT='http://host.docker.internal:8000'
# Optional auth header and key if your Chroma server requires authentication:
CHROMA_API_HEADER="X-Api-Key"
CHROMA_API_KEY="sk-123abc"
Start a local Chroma instance with:
docker run -p 8000:8000 chromadb/chroma
Best for: Teams wanting a self-hosted, open-source option with a familiar Python SDK and easy local setup.
Chroma Cloud is the managed, hosted version of Chroma.
VECTOR_DB="chromacloud"
CHROMACLOUD_API_KEY="ck-your-api-key"
CHROMACLOUD_TENANT=your-tenant-id
CHROMACLOUD_DATABASE=your-database-name
Obtain your API key and tenant/database identifiers from the Chroma Cloud dashboard.Best for: Teams that want Chroma without managing infrastructure.
Pinecone is a fully managed cloud vector database with a generous free tier and strong global availability.
VECTOR_DB="pinecone"
PINECONE_API_KEY=your-api-key
PINECONE_INDEX=your-index-name
Create your index in the Pinecone console before connecting. Ensure the index dimension matches the output dimension of your embedding model (e.g., 1536 for text-embedding-ada-002).Best for: Cloud-first teams that want zero infrastructure management and built-in replication.
Qdrant is a high-performance, Rust-based open-source vector database available as a self-hosted binary, Docker image, or managed cloud service.
VECTOR_DB="qdrant"
QDRANT_ENDPOINT="http://localhost:6333"
QDRANT_API_KEY=your-api-key
QDRANT_API_KEY is only required for Qdrant Cloud or self-hosted instances secured with an API key. For unauthenticated local instances, omit it.Start Qdrant locally:
docker run -p 6333:6333 qdrant/qdrant
Best for: High-throughput workloads, teams wanting a fast self-hosted option with rich payload filtering.
Weaviate is a cloud-native vector database with a graph-like object model, rich filtering, and built-in hybrid search.
VECTOR_DB="weaviate"
WEAVIATE_ENDPOINT="http://localhost:8080"
WEAVIATE_API_KEY=your-api-key
WEAVIATE_API_KEY is required for Weaviate Cloud (WCS) instances. For unauthenticated local Docker instances, omit it.Best for: Teams that want hybrid semantic + keyword search and complex object-level metadata filtering.
Milvus is an open-source, horizontally scalable vector database built for enterprise-scale workloads.
VECTOR_DB="milvus"
MILVUS_ADDRESS="http://localhost:19530"
MILVUS_USERNAME=your-username
MILVUS_PASSWORD=your-password
Start a minimal Milvus cluster locally:
docker compose up -d  # using the official Milvus docker-compose.yml
Best for: Enterprise deployments needing horizontal scaling, high availability, and billions of vectors.
Zilliz Cloud is the fully managed cloud version of Milvus, operated by the Milvus creators.
VECTOR_DB="zilliz"
ZILLIZ_ENDPOINT="https://sample.api.gcp-us-west1.zillizcloud.com"
ZILLIZ_API_TOKEN=your-api-token
Find your cluster endpoint and generate an API token in the Zilliz Cloud console.Best for: Teams that want Milvus’s scalability without the operational overhead of self-hosting.
pgvector is a PostgreSQL extension that adds vector similarity search to an existing Postgres database. It lets you store embeddings alongside your application data with no separate vector-database process.
VECTOR_DB="pgvector"
PGVECTOR_CONNECTION_STRING="postgresql://dbuser:dbuserpass@localhost:5432/yourdb"
# Optional — customise the vectors table name (default: anythingllm_vectors):
# PGVECTOR_TABLE_NAME="anythingllm_vectors"
Ensure the pgvector extension is installed in the target database:
CREATE EXTENSION IF NOT EXISTS vector;
Best for: Teams that already operate PostgreSQL and want to avoid running a separate vector-database service.
Astra DB is DataStax’s serverless, cloud-native database built on Apache Cassandra with integrated vector search.
VECTOR_DB="astra"
ASTRA_DB_APPLICATION_TOKEN=AstraCS:...
ASTRA_DB_ENDPOINT=https://your-db-id-region.apps.astra.datastax.com
Obtain your application token and endpoint from the Astra DB console.Best for: Teams using the Cassandra / DataStax ecosystem who want serverless, pay-as-you-go vector storage.

Provider Quick-Reference

VECTOR_DB valueHostingRequired variables
lancedbBuilt-in (embedded)(none)
chromaSelf-hostedCHROMA_ENDPOINT
chromacloudManaged (Chroma)CHROMACLOUD_API_KEY, CHROMACLOUD_TENANT, CHROMACLOUD_DATABASE
pineconeManagedPINECONE_API_KEY, PINECONE_INDEX
qdrantSelf-hosted or managedQDRANT_ENDPOINT
weaviateSelf-hosted or managedWEAVIATE_ENDPOINT
milvusSelf-hostedMILVUS_ADDRESS, MILVUS_USERNAME, MILVUS_PASSWORD
zillizManaged (Zilliz Cloud)ZILLIZ_ENDPOINT, ZILLIZ_API_TOKEN
pgvectorSelf-hosted (PostgreSQL)PGVECTOR_CONNECTION_STRING
astraManaged (DataStax)ASTRA_DB_APPLICATION_TOKEN, ASTRA_DB_ENDPOINT
The VECTOR_DB variable set via environment takes precedence over any selection stored in the AnythingLLM database. Remove it from .env if you want to manage the vector database through the Settings UI instead.

Build docs developers (and LLMs) love