The Ollama provider enables you to run open-source models locally on your machine. This gives you complete privacy, offline capabilities, and no API costs. Perfect for development, experimentation, and applications that require data privacy.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/firebase/genkit/llms.txt
Use this file to discover all available pages before exploring further.
Installation
1. Install Ollama
First, install Ollama on your system: macOS / Linux:2. Install Genkit Plugin
3. Pull Models
Download models you want to use:Setup
Basic Configuration
Remote Ollama Server
Connect to Ollama running on a different machine:With Custom Headers
Add authentication or other headers:Dynamic Headers
Use a function for request-time headers:Available Models
Ollama supports many open-source models:Text Generation
Llama 3 (Meta):llama3- 8B parameter model, fast and capablellama3:70b- 70B parameter, more powerful
mistral- 7B, excellent performancemistral-nemo- 12B, enhanced capabilities
gemma- 2B/7B, efficient modelsgemma2- 9B/27B, improved versions
phi3- 3.8B, small but powerfulphi3:medium- 14B parameters
qwen2- Multiple sizes available
deepseek-r1- Reasoning model
Code Generation
codellama- Code-specialized Llamastarcoder2- Code generationcodegemma- Google’s code model
Embeddings
nomic-embed-text- High-quality embeddingsmxbai-embed-large- Large embedding modelall-minilm- Lightweight embeddings
Vision Models
llava- Llama + visionbakllava- Alternative vision model
Usage Examples
Basic Text Generation
Using Model References
Streaming Responses
Function Calling
Tool calling is only supported on models configured with
type: 'chat' (the default). Not all Ollama models support tools - test with your specific model.Multimodal (Vision)
Text Embeddings
For embedders, you must specify the
dimensions in the plugin configuration.Using Different Model Sizes
Using in a Flow
Configuration Options
Model Configuration
Model Types
Ollama supports two API types:- Multi-turn conversations
- Function calling support
- System messages
- Simple text completion
- No conversation history
- No tool support
Model Capabilities
Specify what features a model supports:Managing Models
Pull Models
List Models
Remove Models
Show Model Info
Create Custom Models
Create aModelfile:
Performance Optimization
GPU Acceleration
Ollama automatically uses GPU if available:- NVIDIA GPUs: CUDA
- AMD GPUs: ROCm
- Apple Silicon: Metal
Model Quantization
Use quantized models for faster inference:Concurrent Requests
Ollama handles multiple requests efficiently:System Requirements
Minimum Requirements
- RAM: 8GB (for 7B models)
- Disk: 5GB per model
- CPU: Modern multi-core processor
Recommended for Larger Models
- RAM: 16GB+ (for 13B+ models)
- RAM: 32GB+ (for 70B models)
- GPU: 8GB+ VRAM for acceleration
Model Size Guide
| Model Size | RAM Required | Speed | Quality |
|---|---|---|---|
| 2B - 3B | 4GB | Very Fast | Good |
| 7B - 8B | 8GB | Fast | Very Good |
| 13B - 14B | 16GB | Medium | Excellent |
| 30B - 70B | 32GB+ | Slow | Outstanding |
Troubleshooting
Ollama Server Not Running
Model Not Found
Out of Memory
- Use a smaller model (e.g.,
llama3:7binstead ofllama3:70b) - Use quantized version (e.g.,
llama3:8b-q4_0) - Close other applications
- Increase system swap/virtual memory
Slow Performance
Solutions:- Use GPU acceleration
- Use quantized models
- Use smaller models
- Reduce
maxOutputTokens - Increase system resources
Connection Refused
Best Practices
- Start with smaller models -
llama3:8bis a good default - Use quantized models for production to balance speed and quality
- Monitor system resources - watch RAM and GPU usage
- Keep models updated -
ollama pull <model>regularly - Use appropriate model sizes for your hardware
- Enable GPU acceleration if available
- Cache frequently-used models in memory
- Test locally before deploying
Privacy Benefits
Complete Data Privacy:- All processing happens locally
- No data sent to external APIs
- No internet required (after model download)
- Full control over model versions
- Healthcare applications (HIPAA compliance)
- Financial services
- Legal document processing
- Internal corporate tools
- Sensitive data analysis
Comparison: Ollama vs Cloud Providers
| Aspect | Ollama | Cloud APIs |
|---|---|---|
| Privacy | Complete | Limited |
| Cost | Free (after hardware) | Pay-per-use |
| Internet | Not required | Required |
| Setup | Moderate | Simple |
| Performance | Depends on hardware | Consistent |
| Model Access | Open-source only | Proprietary + open |
| Latency | Very low (local) | Network dependent |
| Scale | Single machine | Unlimited |