Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/AsyncFuncAI/deepwiki-open/llms.txt

Use this file to discover all available pages before exploring further.

When DeepWiki clones a repository it splits the source files into chunks, converts each chunk into a vector embedding, and stores those vectors in a local index under ~/.adalflow/databases/. These embeddings power two features: wiki generation, where relevant code is retrieved as context for each documentation section, and the Ask feature, where your question is matched against the index to return accurate, code-grounded answers. Choosing an embedding provider is therefore a foundational configuration decision.

Embedder types

DeepWiki supports four embedding providers, selected with the DEEPWIKI_EMBEDDER_TYPE environment variable.
TypeModelAPI key requiredNotes
openaitext-embedding-3-small (256 dimensions)OPENAI_API_KEYDefault. Batch size 500.
googlegemini-embedding-001GOOGLE_API_KEYReuses your existing Gemini key. Batch size 100.
ollamanomic-embed-textNoneRequires a local Ollama installation.
bedrockamazon.titan-embed-text-v2:0 (256 dimensions)AWS credentialsBatch size 100.

Setting the embedder type

Add DEEPWIKI_EMBEDDER_TYPE to your .env file or environment:
# OpenAI (default — no variable needed, but explicit here for clarity)
DEEPWIKI_EMBEDDER_TYPE=openai

# Google AI
DEEPWIKI_EMBEDDER_TYPE=google

# Local Ollama
DEEPWIKI_EMBEDDER_TYPE=ollama

# AWS Bedrock
DEEPWIKI_EMBEDDER_TYPE=bedrock

Provider setup

OpenAI is the default embedder. The text-embedding-3-small model is used with 256 dimensions and float encoding.Required environment variable
OPENAI_API_KEY=your_openai_api_key
Optional — custom base URLIf you need to route embedding requests through a private endpoint, set:
OPENAI_BASE_URL=https://your-endpoint.com/v1
This is the same variable used by the OpenAI text generation client (see Model providers).embedder.json excerpt
{
  "embedder": {
    "client_class": "OpenAIClient",
    "batch_size": 500,
    "model_kwargs": {
      "model": "text-embedding-3-small",
      "dimensions": 256,
      "encoding_format": "float"
    }
  }
}

Using OpenAI-compatible embedding models

Some providers (such as Alibaba Cloud’s Qwen family) expose an OpenAI-compatible embeddings API. DeepWiki ships a ready-made config template for this case at api/config/embedder.openai_compatible.json.bak. To switch to an OpenAI-compatible embedder:
  1. Replace api/config/embedder.json with the contents of the compatible template:
{
  "embedder": {
    "client_class": "OpenAIClient",
    "initialize_kwargs": {
      "api_key": "${OPENAI_API_KEY}",
      "base_url": "${OPENAI_BASE_URL}"
    },
    "batch_size": 10,
    "model_kwargs": {
      "model": "text-embedding-v3",
      "dimensions": 256,
      "encoding_format": "float"
    }
  },
  "retriever": {
    "top_k": 20
  },
  "text_splitter": {
    "split_by": "word",
    "chunk_size": 350,
    "chunk_overlap": 100
  }
}
  1. Set the environment variables:
OPENAI_API_KEY=your_provider_api_key
OPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
DeepWiki automatically substitutes ${OPENAI_API_KEY} and ${OPENAI_BASE_URL} placeholders in embedder.json with the values from your environment. No code changes are needed.

Switching embedders

Switching DEEPWIKI_EMBEDDER_TYPE after a repository has already been indexed requires regenerating that repository’s embeddings. Embeddings from different models occupy different vector spaces and are not interchangeable. Delete the existing database for the repository under ~/.adalflow/databases/ and regenerate the wiki to rebuild the index with the new embedder.

Text splitting configuration

Regardless of which embedder you choose, DeepWiki splits source files into overlapping chunks before embedding them. The defaults are defined in embedder.json:
{
  "text_splitter": {
    "split_by": "word",
    "chunk_size": 350,
    "chunk_overlap": 100
  },
  "retriever": {
    "top_k": 20
  }
}
  • chunk_size: Maximum number of words per chunk (350 by default).
  • chunk_overlap: Number of words shared between adjacent chunks (100 by default), preserving context at boundaries.
  • top_k: Number of chunks retrieved per query during RAG (20 by default).
These values can be tuned in embedder.json or in a custom config directory specified by DEEPWIKI_CONFIG_DIR.

Build docs developers (and LLMs) love