Installation
Install Ollama
First, install Ollama on your system:Start Ollama Server
http://localhost:11434 by default.
Pull a Model
Quick Start
Available Models
Meta Llama Series
Meta’s open-source models with strong performance.Alibaba Qwen Series
High-quality multilingual models.Groq Llama Tool Use
Optimized for function calling.Granite Vision
Multimodal model with vision capabilities.DeepSeek Reasoning
Models with extended reasoning capabilities.Embedding Models
Code Examples
Basic Chat Completion
Function Calling
Vision - Image Analysis
Structured Output
Streaming Responses
Embeddings
Content Moderation
Dynamic Model Loading
Load models on-demand:List Available Models
Advanced Configuration
Custom Context Window
Custom Parameters
Temperature and Options
Model Capabilities
| Model | Context | Tools | Vision | Moderation | Speed |
|---|---|---|---|---|---|
| Llama 3.2 | 131K | ✅ | ❌ | ❌ | Fast |
| Llama 4 | 10M | ✅ | ❌ | ❌ | Medium |
| Qwen 2.5 | 32K | ✅ | ❌ | ❌ | Fast |
| Granite Vision | 16K | ✅ | ✅ | ❌ | Medium |
| Llama Guard 3 | 131K | ❌ | ❌ | ✅ | Fast |
Best Practices
- Start with smaller models during development (3B-8B parameters)
- Use tool-optimized models (Groq variants) for function calling
- Pull models in advance - downloading can take time
- Adjust context window based on your use case
- Monitor resource usage - larger models need more RAM/VRAM
- Use GPU acceleration for better performance
System Requirements
RAM Requirements
- 7B models: 8GB RAM minimum
- 13B models: 16GB RAM minimum
- 33B+ models: 32GB RAM minimum
- 70B models: 64GB RAM minimum
GPU Acceleration
Ollama automatically uses GPU if available:- NVIDIA: CUDA support
- Apple: Metal acceleration on M1/M2/M3
- AMD: ROCm support (Linux)
Troubleshooting
Ollama Not Running
Model Not Found
Out of Memory
Slow Performance
Docker Deployment
Advantages
- Free: No API costs
- Private: Data never leaves your machine
- Offline: Works without internet
- Fast iteration: No rate limits
- Full control: Choose any open-source model
Limitations
- Requires local resources: RAM/GPU
- Slower than cloud APIs: Depends on hardware
- Model quality varies: Not as capable as GPT-4/Claude
- Manual model management: Need to pull/update models