Ollama enables running large language models locally on your machine.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/zeroclaw-labs/zeroclaw/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Ollama provides:- Local model execution (no API costs)
- Privacy (data stays on your machine)
- Offline operation
- Fast inference on local hardware
- Llama 2/3
- Mistral
- Mixtral
- Phi
- Gemma
- And more
Prerequisites
Install Ollama
- macOS
- Linux
- Windows
Start Ollama Server
http://localhost:11434
Pull a Model
Configuration
Config File
CLI Usage
Features
Tool Calling
Ollama supports tool calling for compatible models:llama3:70bmixtral:8x7bmistral
Smaller models may have limited tool calling capabilities.
Streaming
Real-time response streaming:Custom Parameters
Model Selection Guide
For General Use
For Coding
For Maximum Quality
Performance Tuning
GPU Acceleration
Ollama automatically uses GPU if available (CUDA, Metal, ROCm). Check GPU usage:Context Window
Adjust context size:Batch Size
Request Format
Ollama uses a simple JSON format:Troubleshooting
'Connection refused' error
'Connection refused' error
Solution:Start Ollama server:Verify it’s running:
'Model not found' error
'Model not found' error
Solution:Pull the model first:
Slow inference
Slow inference
Solutions:
- Use smaller model:
- Reduce context window:
- Enable GPU acceleration (requires compatible hardware)
Out of memory
Out of memory
Solutions:
- Use smaller model
- Reduce context window
- Close other applications
- Reduce num_batch