DeepWiki Open uses three JSON configuration files to control AI model selection, embedding behaviour, and repository file handling. By default these files live inDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/AsyncFuncAI/deepwiki-open/llms.txt
Use this file to discover all available pages before exploring further.
api/config/ inside the project root. You can move them anywhere and point DeepWiki at the new location by setting the DEEPWIKI_CONFIG_DIR environment variable — no code changes required.
generator.json
generator.json defines every LLM provider DeepWiki can use for text generation. It controls which providers are available in the UI, which models each provider exposes, and the sampling parameters applied to each model.
Top-level fields
| Field | Type | Description |
|---|---|---|
default_provider | string | The provider selected by default in the UI (e.g. "google"). |
providers | object | Map of provider ID to provider configuration. |
Provider configuration fields
| Field | Type | Description |
|---|---|---|
default_model | string | Model pre-selected when this provider is chosen. |
supportsCustomModel | boolean | When true, users can type a model ID not listed under models. |
models | object | Map of model ID to sampling parameter object. |
client_class | string | Internal client class name (required only for bedrock and azure). |
Model sampling parameters
Most providers use top-leveltemperature, top_p, and optionally top_k. Ollama models nest these under an options key along with num_ctx.
| Field | Type | Description |
|---|---|---|
temperature | number | Sampling temperature. Higher values increase randomness. |
top_p | number | Nucleus sampling threshold. |
top_k | number | Top-k sampling limit (used by Google models). |
options.num_ctx | integer | Context window size in tokens (Ollama only). |
Full file
Adding a new model
To add a model without changing code, add an entry under the appropriate provider’smodels map and restart the API server:
Adding a new provider
Add a new key underproviders with at least default_model, supportsCustomModel, and models. If the provider requires a custom client class (as bedrock and azure do), also set client_class to the appropriate internal class name.
embedder.json
embedder.json configures the embedding model used to convert repository code into vectors for retrieval, the retriever’s top-k setting, and the text splitter that chunks source files before embedding.
Top-level sections
| Section | Description |
|---|---|
embedder | OpenAI-compatible embedding client (default, uses text-embedding-3-small). |
embedder_ollama | Local Ollama embedding client. |
embedder_google | Google AI embedding client (uses gemini-embedding-001). |
embedder_bedrock | Amazon Bedrock embedding client. |
retriever | Controls how many chunks are retrieved per query (top_k). |
text_splitter | Controls how source files are chunked before embedding. |
Embedder fields
| Field | Type | Description |
|---|---|---|
client_class | string | Internal client class: "OpenAIClient", "OllamaClient", "GoogleEmbedderClient", or "BedrockClient". |
batch_size | integer | Number of texts submitted per embedding API call. |
model_kwargs.model | string | Embedding model identifier. |
model_kwargs.dimensions | integer | Output vector dimensionality (OpenAI and Bedrock). |
model_kwargs.encoding_format | string | Vector encoding format (OpenAI): "float". |
model_kwargs.task_type | string | Embedding task hint (Google): "SEMANTIC_SIMILARITY". |
Retriever fields
| Field | Type | Description |
|---|---|---|
top_k | integer | Number of most relevant chunks to retrieve per query. Default is 20. |
Text splitter fields
| Field | Type | Description |
|---|---|---|
split_by | string | Unit of splitting: "word". |
chunk_size | integer | Maximum number of words per chunk. Default is 350. |
chunk_overlap | integer | Number of words that overlap between consecutive chunks. Default is 100. |
Full file
Selecting an embedder type
The active embedder section is chosen by setting theDEEPWIKI_EMBEDDER_TYPE environment variable:
| Value | Section used | API key required |
|---|---|---|
openai (default) | embedder | OPENAI_API_KEY |
google | embedder_google | GOOGLE_API_KEY |
ollama | embedder_ollama | None (local) |
bedrock | embedder_bedrock | AWS credentials |
Using OpenAI-compatible embeddings (e.g. Alibaba Qwen)
DeepWiki ships an alternative template for embedding services that implement the OpenAI API. The template is stored atapi/config/embedder.openai_compatible.json.bak. To use it:
- Replace the contents of
api/config/embedder.jsonwith the contents ofapi/config/embedder.openai_compatible.json.bak. - Set the relevant environment variables in your
.envfile:
$OPENAI_API_KEY and $OPENAI_BASE_URL placeholders in the file at startup, so no code changes are needed.
repo.json
repo.json controls which files and directories DeepWiki reads when cloning and indexing a repository. It also sets a maximum repository size. Adjusting this file lets you narrow or broaden what DeepWiki analyses without modifying application code.
Top-level sections
| Section | Description |
|---|---|
file_filters | Lists of excluded directories and file patterns. |
repository | Global repository constraints. |
file_filters fields
| Field | Type | Description |
|---|---|---|
excluded_dirs | string[] | Directory paths (relative to repo root) to skip entirely during traversal. |
excluded_files | string[] | File names or glob patterns to exclude. Supports wildcards like *.min.js. |
repository fields
| Field | Type | Description |
|---|---|---|
max_size_mb | integer | Maximum repository size in megabytes. Repositories exceeding this limit are rejected. Default is 50000. |
Full file
Customizing file filters
To include files that are excluded by default (for example,*.cfg configuration files that are meaningful to your project), remove the relevant entry from excluded_files. To exclude additional paths, append them to the appropriate list.
You can also pass per-request overrides via the excluded_dirs, excluded_files, included_dirs, and included_files fields on the POST /chat/completions/stream and WebSocket /ws/chat endpoints — these take effect without modifying the config file.
Environment variables quick reference
| Variable | Description | Default |
|---|---|---|
DEEPWIKI_CONFIG_DIR | Directory containing all three config files. | api/config/ |
DEEPWIKI_EMBEDDER_TYPE | Active embedder section: openai, google, ollama, or bedrock. | openai |
OPENAI_BASE_URL | Custom base URL for OpenAI-compatible embedding or model endpoints. | OpenAI default |
OLLAMA_HOST | Ollama server URL for local model and embedding requests. | http://localhost:11434 |
LOG_LEVEL | Logging verbosity: DEBUG, INFO, WARNING, ERROR, or CRITICAL. | INFO |
LOG_FILE_PATH | Path to write log output. Must be inside api/logs/. | api/logs/application.log |