Documentation Index
Fetch the complete documentation index at: https://mintlify.com/AsyncFuncAI/deepwiki-open/llms.txt
Use this file to discover all available pages before exploring further.
When DeepWiki clones a repository it splits the source files into chunks, converts each chunk into a vector embedding, and stores those vectors in a local index under ~/.adalflow/databases/. These embeddings power two features: wiki generation, where relevant code is retrieved as context for each documentation section, and the Ask feature, where your question is matched against the index to return accurate, code-grounded answers. Choosing an embedding provider is therefore a foundational configuration decision.
Embedder types
DeepWiki supports four embedding providers, selected with the DEEPWIKI_EMBEDDER_TYPE environment variable.
| Type | Model | API key required | Notes |
|---|
openai | text-embedding-3-small (256 dimensions) | OPENAI_API_KEY | Default. Batch size 500. |
google | gemini-embedding-001 | GOOGLE_API_KEY | Reuses your existing Gemini key. Batch size 100. |
ollama | nomic-embed-text | None | Requires a local Ollama installation. |
bedrock | amazon.titan-embed-text-v2:0 (256 dimensions) | AWS credentials | Batch size 100. |
Setting the embedder type
Add DEEPWIKI_EMBEDDER_TYPE to your .env file or environment:
# OpenAI (default — no variable needed, but explicit here for clarity)
DEEPWIKI_EMBEDDER_TYPE=openai
# Google AI
DEEPWIKI_EMBEDDER_TYPE=google
# Local Ollama
DEEPWIKI_EMBEDDER_TYPE=ollama
# AWS Bedrock
DEEPWIKI_EMBEDDER_TYPE=bedrock
Provider setup
OpenAI
Google AI
Ollama
AWS Bedrock
OpenAI is the default embedder. The text-embedding-3-small model is used with 256 dimensions and float encoding.Required environment variableOPENAI_API_KEY=your_openai_api_key
Optional — custom base URLIf you need to route embedding requests through a private endpoint, set:OPENAI_BASE_URL=https://your-endpoint.com/v1
This is the same variable used by the OpenAI text generation client (see Model providers).embedder.json excerpt{
"embedder": {
"client_class": "OpenAIClient",
"batch_size": 500,
"model_kwargs": {
"model": "text-embedding-3-small",
"dimensions": 256,
"encoding_format": "float"
}
}
}
Google AI embeddings use the gemini-embedding-001 model with the SEMANTIC_SIMILARITY task type, making them well-suited for code retrieval.Required environment variableGOOGLE_API_KEY=your_google_api_key
DEEPWIKI_EMBEDDER_TYPE=google
No additional setup is required — the same API key used for Gemini text generation works for embeddings.embedder.json excerpt{
"embedder_google": {
"client_class": "GoogleEmbedderClient",
"batch_size": 100,
"model_kwargs": {
"model": "gemini-embedding-001",
"task_type": "SEMANTIC_SIMILARITY"
}
}
}
If you are already using Google Gemini for text generation, enabling Google AI embeddings keeps your entire pipeline within a single provider, which can improve semantic consistency between retrieved context and generated output.
The Ollama embedder runs entirely on your local machine using nomic-embed-text. No API key or external network access is required.Required setup
- Install Ollama from ollama.com.
- Pull the embedding model:
ollama pull nomic-embed-text
- Set the embedder type:
DEEPWIKI_EMBEDDER_TYPE=ollama
If Ollama runs on a remote host, also set:OLLAMA_HOST=http://your-ollama-host:11434
embedder.json excerpt{
"embedder_ollama": {
"client_class": "OllamaClient",
"model_kwargs": {
"model": "nomic-embed-text"
}
}
}
The Bedrock embedder uses Amazon Titan Embed Text v2 with 256 dimensions.Required environment variablesDEEPWIKI_EMBEDDER_TYPE=bedrock
AWS_ACCESS_KEY_ID=your_access_key_id
AWS_SECRET_ACCESS_KEY=your_secret_access_key
AWS_REGION=us-east-1
For role-based access:AWS_ROLE_ARN=arn:aws:iam::123456789012:role/DeepWikiRole
embedder.json excerpt{
"embedder_bedrock": {
"client_class": "BedrockClient",
"batch_size": 100,
"model_kwargs": {
"model": "amazon.titan-embed-text-v2:0",
"dimensions": 256
}
}
}
Using OpenAI-compatible embedding models
Some providers (such as Alibaba Cloud’s Qwen family) expose an OpenAI-compatible embeddings API. DeepWiki ships a ready-made config template for this case at api/config/embedder.openai_compatible.json.bak.
To switch to an OpenAI-compatible embedder:
- Replace
api/config/embedder.json with the contents of the compatible template:
{
"embedder": {
"client_class": "OpenAIClient",
"initialize_kwargs": {
"api_key": "${OPENAI_API_KEY}",
"base_url": "${OPENAI_BASE_URL}"
},
"batch_size": 10,
"model_kwargs": {
"model": "text-embedding-v3",
"dimensions": 256,
"encoding_format": "float"
}
},
"retriever": {
"top_k": 20
},
"text_splitter": {
"split_by": "word",
"chunk_size": 350,
"chunk_overlap": 100
}
}
- Set the environment variables:
OPENAI_API_KEY=your_provider_api_key
OPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
DeepWiki automatically substitutes ${OPENAI_API_KEY} and ${OPENAI_BASE_URL} placeholders in embedder.json with the values from your environment. No code changes are needed.
Switching embedders
Switching DEEPWIKI_EMBEDDER_TYPE after a repository has already been indexed requires regenerating that repository’s embeddings. Embeddings from different models occupy different vector spaces and are not interchangeable. Delete the existing database for the repository under ~/.adalflow/databases/ and regenerate the wiki to rebuild the index with the new embedder.
Text splitting configuration
Regardless of which embedder you choose, DeepWiki splits source files into overlapping chunks before embedding them. The defaults are defined in embedder.json:
{
"text_splitter": {
"split_by": "word",
"chunk_size": 350,
"chunk_overlap": 100
},
"retriever": {
"top_k": 20
}
}
chunk_size: Maximum number of words per chunk (350 by default).
chunk_overlap: Number of words shared between adjacent chunks (100 by default), preserving context at boundaries.
top_k: Number of chunks retrieved per query during RAG (20 by default).
These values can be tuned in embedder.json or in a custom config directory specified by DEEPWIKI_CONFIG_DIR.