The demo uses Llama 3.1 8B by default, but you can easily switch to different models based on your hardware constraints or quality requirements.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/holzerjm/civichacks-demo/llms.txt
Use this file to discover all available pages before exploring further.
Available model options
Ollama provides a wide range of open source models. Here are recommended options:Smaller/faster models
Ideal for limited hardware or faster inference:| Model | Size | Speed | Use case |
|---|---|---|---|
llama3.2:3b | 3B | Very fast | Quick responses, limited hardware |
phi3:mini | 3.8B | Fast | General use on any laptop |
gemma2:2b | 2B | Very fast | Maximum speed, basic tasks |
Balanced models
Good quality with reasonable resource requirements:| Model | Size | Speed | Use case |
|---|---|---|---|
llama3.1 | 8B | Medium | Default - best balance (demo uses this) |
llama3.2 | 8B | Medium | Latest version with improvements |
mistral | 7B | Medium | Strong general-purpose model |
Larger/better models
Require more RAM but provide higher quality:| Model | Size | RAM needed | Use case |
|---|---|---|---|
llama3.1:70b | 70B | ~40GB | Best quality, needs powerful hardware |
mixtral | 47B | ~32GB | Strong reasoning, mixture of experts |
Specialized models
Optimized for specific tasks:| Model | Size | Specialization |
|---|---|---|
deepseek-r1:7b | 7B | Strong reasoning tasks (MIT license) |
codellama | 7B-34B | Code generation and analysis |
Apple Silicon Macs (M1/M2/M3/M4) handle 8B models beautifully at 15-25 tokens/second. CPU-only machines work fine at ~3-5 tokens/second.
Pull a new model
Download models using Ollama before using them:Update the scripts
Once you’ve pulled a new model, update the scripts to use it:Step 1: Basic local AI (demo_step1_ollama.py)
scripts/demo_step1_ollama.py.
Step 2: RAG with civic data (demo_step2_rag.py)
scripts/demo_step2_rag.py.
The embedding model (
all-MiniLM-L6-v2) should generally stay the same - it’s used for search, not generation.Step 3: Web application (demo_step3_app.py)
scripts/demo_step3_app.py.
Step 4 & 5: BYOD scripts
The BYOD scripts support a--model flag, so no code changes needed:
Performance considerations
Hardware requirements
| Component | 3B model | 8B model | 70B model |
|---|---|---|---|
| RAM | 4GB+ | 8GB+ | 40GB+ |
| GPU VRAM | Optional | Optional | Recommended 24GB+ |
| Storage | 2GB | 5GB | 40GB |
| Speed (CPU) | 8-15 tok/s | 3-8 tok/s | <1 tok/s |
| Speed (GPU) | 30-60 tok/s | 15-25 tok/s | 5-10 tok/s |
Timeout settings
Larger models may need longer timeouts:Model selection guide
Choose based on your priorities:Optimize for speed
If you need fast responses or have limited hardware:Update scripts to use
model="llama3.2:3b"Optimize for quality
If you have powerful hardware and want best results:Update scripts to use
model="llama3.1:70b" and increase request_timeout=300.0Check available models
See what models you have downloaded:Next steps
Swap in your own data
Replace the demo datasets with your files
Customize the UI
Modify the Gradio interface appearance