Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Mintplex-Labs/anything-llm/llms.txt

Use this file to discover all available pages before exploring further.

AnythingLLM ships with connectors for more than 30 language model providers — from fully local runtimes such as Ollama and LM Studio to managed cloud APIs like OpenAI, Anthropic, and AWS Bedrock. You can switch providers at any time without re-importing documents; only new chat completions will use the updated provider. The active provider is controlled by the LLM_PROVIDER environment variable (or selected in the setup wizard), and each provider then reads its own set of keys and preferences from the environment.

Two Ways to Configure

When you first launch AnythingLLM, the setup wizard walks you through selecting a provider and entering credentials directly in the browser. You can return to these settings at any time via Settings → LLM Preference in the sidebar.The UI writes your choices to the database, so no .env edits are required for basic provider selection when using the wizard.

Local / Self-Hosted Providers

These providers run entirely on your own hardware — no data is sent to external services.
Ollama is the most popular self-hosted LLM runtime and the recommended starting point for local deployments.
LLM_PROVIDER='ollama'
OLLAMA_BASE_PATH='http://host.docker.internal:11434'
OLLAMA_MODEL_PREF='llama3'
OLLAMA_MODEL_TOKEN_LIMIT=4096
# Optional — only if your Ollama server requires a Bearer token:
OLLAMA_AUTH_TOKEN='your-ollama-auth-token-here'
# Optional — max response timeout in ms (default 5 min):
# OLLAMA_RESPONSE_TIMEOUT=7200000
When running AnythingLLM inside Docker, use host.docker.internal to reach Ollama on the host machine. On Linux you may need --add-host=host.docker.internal:host-gateway in your Docker run command.
LM Studio exposes an OpenAI-compatible local API server.
LLM_PROVIDER='lmstudio'
LMSTUDIO_BASE_PATH='http://your-server:1234/v1'
LMSTUDIO_MODEL_PREF='Loaded from Chat UI'
LMSTUDIO_MODEL_TOKEN_LIMIT=4096
# Optional auth token if LM Studio is configured with one:
LMSTUDIO_AUTH_TOKEN='your-lmstudio-auth-token-here'
LocalAI is an open-source, self-hosted inference server compatible with the OpenAI API.
LLM_PROVIDER='localai'
LOCAL_AI_BASE_PATH='http://host.docker.internal:8080/v1'
LOCAL_AI_MODEL_PREF='luna-ai-llama2'
LOCAL_AI_MODEL_TOKEN_LIMIT=4096
LOCAL_AI_API_KEY="sk-123abc"
KoboldCPP is a single-file LLM runtime popular in the creative writing community.
LLM_PROVIDER='koboldcpp'
KOBOLD_CPP_BASE_PATH='http://127.0.0.1:5000/v1'
KOBOLD_CPP_MODEL_PREF='koboldcpp/codellama-7b-instruct.Q4_K_S'
KOBOLD_CPP_MODEL_TOKEN_LIMIT=4096
Supports any model loaded in the popular oobabooga text-generation-webui.
LLM_PROVIDER='textgenwebui'
TEXT_GEN_WEB_UI_BASE_PATH='http://127.0.0.1:5000/v1'
TEXT_GEN_WEB_UI_TOKEN_LIMIT=4096
TEXT_GEN_WEB_UI_API_KEY='sk-123abc'
NVIDIA NIM exposes locally deployed NVIDIA-accelerated models through an OpenAI-compatible endpoint.
LLM_PROVIDER='nvidia-nim'
NVIDIA_NIM_LLM_BASE_PATH='http://127.0.0.1:8000'
NVIDIA_NIM_LLM_MODEL_PREF='meta/llama-3.2-3b-instruct'
Docker Model Runner runs models directly inside the Docker engine on supported hardware.
LLM_PROVIDER='docker-model-runner'
DOCKER_MODEL_RUNNER_BASE_PATH='http://127.0.0.1:12434'
DOCKER_MODEL_RUNNER_LLM_MODEL_PREF='phi-3.5-mini'
DOCKER_MODEL_RUNNER_LLM_MODEL_TOKEN_LIMIT=4096
Lemonade is an AMD-optimized local inference server (part of the AMD ROCm ecosystem).
LLM_PROVIDER='lemonade'
LEMONADE_LLM_BASE_PATH='http://127.0.0.1:8000'
LEMONADE_LLM_MODEL_PREF='Llama-3.2-3B-Instruct'
# Optional — API key if your Lemonade server requires authentication:
# LEMONADE_LLM_API_KEY='your-lemonade-api-key'
# Optional — override model context window size:
# LEMONADE_LLM_MODEL_TOKEN_LIMIT=8192
Microsoft Foundry Local is a local inference runtime for Windows with hardware acceleration.
LLM_PROVIDER='foundry'
FOUNDRY_BASE_PATH='http://127.0.0.1:55776'
FOUNDRY_MODEL_PREF='phi-3.5-mini'
FOUNDRY_MODEL_TOKEN_LIMIT=4096
PrivateMode is a privacy-focused local inference server.
LLM_PROVIDER='privatemode'
PRIVATEMODE_LLM_BASE_PATH='http://127.0.0.1:8080'
PRIVATEMODE_LLM_MODEL_PREF='gemma-3-27b'

Cloud / Commercial Providers

LLM_PROVIDER='openai'
OPEN_AI_KEY=sk-...
OPEN_MODEL_PREF='gpt-4o'
The OPEN_MODEL_PREF can be any model ID returned by the OpenAI models API. Popular choices include gpt-4o, gpt-4o-mini, o1, and o3-mini.
LLM_PROVIDER='azure'
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_KEY=...
AZURE_OPENAI_MODEL_PREF='my-gpt35-deployment'
# For Azure-based embeddings:
# EMBEDDING_MODEL_PREF='embedder-model'
AZURE_OPENAI_MODEL_PREF is the deployment name you created in Azure OpenAI Studio, not the base model name.
LLM_PROVIDER='bedrock'
AWS_BEDROCK_LLM_REGION=us-west-2
AWS_BEDROCK_API_KEY="..."
AWS_BEDROCK_LLM_MODEL_PREFERENCE=global.anthropic.claude-haiku-4-5-20251001-v1:0
AWS_BEDROCK_LLM_MODEL_TOKEN_LIMIT='128000'
# Disable streaming if your network or model does not support it:
# AWS_BEDROCK_STREAMING_DISABLED=1
LLM_PROVIDER='mistral'
MISTRAL_API_KEY='...'
MISTRAL_MODEL_PREF='mistral-tiny'
LLM_PROVIDER='groq'
GROQ_API_KEY=gsk_...
GROQ_MODEL_PREF=llama3-8b-8192
LLM_PROVIDER='cohere'
COHERE_API_KEY=...
COHERE_MODEL_PREF='command-r'
LLM_PROVIDER='perplexity'
PERPLEXITY_API_KEY='...'
PERPLEXITY_MODEL_PREF='codellama-34b-instruct'
LLM_PROVIDER='togetherai'
TOGETHER_AI_API_KEY='...'
TOGETHER_AI_MODEL_PREF='mistralai/Mixtral-8x7B-Instruct-v0.1'
LLM_PROVIDER='fireworksai'
FIREWORKS_AI_LLM_API_KEY='...'
FIREWORKS_AI_LLM_MODEL_PREF='accounts/fireworks/models/llama-v3p1-8b-instruct'
LLM_PROVIDER='openrouter'
OPENROUTER_API_KEY='...'
OPENROUTER_MODEL_PREF='openrouter/auto'
LLM_PROVIDER='deepseek'
DEEPSEEK_API_KEY='...'
DEEPSEEK_MODEL_PREF='deepseek-chat'
LLM_PROVIDER='xai'
XAI_LLM_API_KEY='xai-...'
XAI_LLM_MODEL_PREF='grok-beta'
LLM_PROVIDER='novita'
NOVITA_LLM_API_KEY='...'
NOVITA_LLM_MODEL_PREF='deepseek/deepseek-r1'
LLM_PROVIDER='sambanova'
SAMBANOVA_LLM_API_KEY='...'
SAMBANOVA_LLM_MODEL_PREF='gpt-oss-120b'
LLM_PROVIDER='cerebras'
CEREBRAS_API_KEY='...'
CEREBRAS_MODEL_PREF='gpt-oss-120b'
LLM_PROVIDER='minimax'
MINIMAX_API_KEY='sk-cp-...'
MINIMAX_MODEL_PREF='MiniMax-M2.7'
LLM_PROVIDER='moonshotai'
MOONSHOT_AI_API_KEY='...'
MOONSHOT_AI_MODEL_PREF='moonshot-v1-32k'
LLM_PROVIDER='giteeai'
GITEE_AI_API_KEY=...
GITEE_AI_MODEL_PREF=...
GITEE_AI_MODEL_TOKEN_LIMIT=...
LLM_PROVIDER='zai'
ZAI_API_KEY="..."
ZAI_MODEL_PREF="glm-4.5"
LLM_PROVIDER='ppio'
PPIO_API_KEY='...'
PPIO_MODEL_PREF=deepseek/deepseek-v3/community
LLM_PROVIDER='apipie'
APIPIE_LLM_API_KEY='sk-...'
APIPIE_LLM_MODEL_PREF='openrouter/llama-3.1-8b-instruct'
LLM_PROVIDER='cometapi'
COMETAPI_LLM_API_KEY='...'
COMETAPI_LLM_MODEL_PREF='gpt-5-mini'
# Optional — stream idle timeout in ms (minimum 500):
# COMETAPI_LLM_TIMEOUT_MS=500

Generic / Compatible Providers

Use these when you have any OpenAI-compatible endpoint that doesn’t have its own named connector, or when routing through an aggregation layer.
LLM_PROVIDER='generic-openai'
GENERIC_OPEN_AI_BASE_PATH='http://proxy.url.openai.com/v1'
GENERIC_OPEN_AI_MODEL_PREF='gpt-3.5-turbo'
GENERIC_OPEN_AI_MODEL_TOKEN_LIMIT=4096
GENERIC_OPEN_AI_API_KEY=sk-123abc
# Optional — inject custom HTTP headers (useful for auth proxies):
# GENERIC_OPEN_AI_CUSTOM_HEADERS="X-Custom-Auth:my-secret-key,X-Custom-Header:my-value"
This is the most flexible option. Any server that speaks the OpenAI Chat Completions API — vLLM, TGI, Aphrodite, etc. — works here.
LiteLLM acts as a universal proxy, translating dozens of provider APIs into a single OpenAI-compatible interface.
LLM_PROVIDER='litellm'
LITE_LLM_MODEL_PREF='gpt-3.5-turbo'
LITE_LLM_MODEL_TOKEN_LIMIT=4096
LITE_LLM_BASE_PATH='http://127.0.0.1:4000'
LITE_LLM_API_KEY='sk-123abc'

Provider Quick-Reference

Provider keyCategoryAuth variable
openaiCloudOPEN_AI_KEY
anthropicCloudANTHROPIC_API_KEY
geminiCloudGEMINI_API_KEY
azureCloudAZURE_OPENAI_KEY
bedrockCloudAWS_BEDROCK_API_KEY
mistralCloudMISTRAL_API_KEY
groqCloudGROQ_API_KEY
cohereCloudCOHERE_API_KEY
perplexityCloudPERPLEXITY_API_KEY
togetheraiCloudTOGETHER_AI_API_KEY
fireworksaiCloudFIREWORKS_AI_LLM_API_KEY
openrouterCloudOPENROUTER_API_KEY
deepseekCloudDEEPSEEK_API_KEY
xaiCloudXAI_LLM_API_KEY
novitaCloudNOVITA_LLM_API_KEY
sambanovaCloudSAMBANOVA_LLM_API_KEY
cerebrasCloudCEREBRAS_API_KEY
minimaxCloudMINIMAX_API_KEY
moonshotaiCloudMOONSHOT_AI_API_KEY
zaiCloudZAI_API_KEY
ppioCloudPPIO_API_KEY
apipieCloudAPIPIE_LLM_API_KEY
cometapiCloudCOMETAPI_LLM_API_KEY
giteeaiCloudGITEE_AI_API_KEY
ollamaLocal(none required)
lmstudioLocal(none required)
localaiLocalLOCAL_AI_API_KEY
koboldcppLocal(none required)
textgenwebuiLocalTEXT_GEN_WEB_UI_API_KEY
nvidia-nimLocal(none required)
docker-model-runnerLocal(none required)
lemonadeLocal(none required — LEMONADE_LLM_API_KEY optional)
foundryLocal(none required)
privatemodeLocal(none required)
generic-openaiGenericGENERIC_OPEN_AI_API_KEY
litellmGenericLITE_LLM_API_KEY
The LLM_PROVIDER value set via environment variables takes precedence over any selection stored in the database. If you want the UI to control the provider, remove LLM_PROVIDER from your .env file.

Build docs developers (and LLMs) love