Ollama lets you run open-source language models directly on your own hardware. When DeepWiki is configured to use Ollama, the entire pipeline — embedding, retrieval, and wiki generation — runs locally. No API keys, no usage costs, and no data leaves your machine.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/AsyncFuncAI/deepwiki-open/llms.txt
Use this file to discover all available pages before exploring further.
Why use Ollama with DeepWiki?
- Free: No per-token charges from any cloud provider
- Private: Source code never leaves your machine
- Offline: Works without an internet connection after models are downloaded
- No API key required: Skip the sign-up and key management overhead
Hardware requirements
For acceptable performance with the default models:| Component | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores | 8+ cores |
| RAM | 8 GB | 16 GB+ |
| Storage | 10 GB free | 20 GB+ free |
| GPU | Optional | Strongly recommended |
Available Ollama models
The following generator models are pre-configured inapi/config/generator.json:
| Model | Context window | Use case |
|---|---|---|
qwen3:1.7b (default) | 32,000 tokens | Good balance of speed and quality |
llama3:8b | 8,000 tokens | Higher quality, slower |
qwen3:8b | 32,000 tokens | Best quality, requires more RAM |
api/config/generator.json.
For embeddings, DeepWiki uses nomic-embed-text when DEEPWIKI_EMBEDDER_TYPE=ollama.
Setup steps
Pull the required models
Pull the embedding model and at least one generator model:The first pull downloads several gigabytes of model weights; subsequent runs use the local cache.
Configure the Ollama embedder
Replace the default embedder configuration with the Ollama-specific one:When prompted to confirm the overwrite, enter
y.Create a .env file
No API keys are needed for a fully local setup:If Ollama is running on the same machine,
OLLAMA_HOST defaults to http://localhost:11434 and does not need to be set.Generate a wiki with Ollama
- Open http://localhost:3000
- Enter a GitHub, GitLab, or Bitbucket repository URL
- Enable the Use Local Ollama Model option in the model selector
- Click Generate Wiki
Docker + Ollama setup
If you prefer to run DeepWiki inside Docker while Ollama runs on the host, use the dedicated Ollama Dockerfile:On Apple Silicon Macs, the Dockerfile automatically selects ARM64 binaries for better native performance.
Using a custom generator model
To switch to any model available inollama list, edit api/config/generator.json and change the "model" value under the ollama provider:
supportsCustomModel is set to true.
Troubleshooting
“Cannot connect to Ollama server”- Run
ollama listto verify Ollama is running - Check that nothing else is using port
11434 - Try restarting Ollama
- Local models are inherently slower than cloud APIs
- Start with
qwen3:1.7b(smallest, fastest) and only move to larger models if quality is insufficient - A GPU with at least 8 GB VRAM significantly improves throughput
- Use a smaller model such as
phi3:mini - Close other memory-intensive applications before running Ollama
- Reduce the
num_ctxvalue ingenerator.jsonto lower memory usage