Documentation Index
Fetch the complete documentation index at: https://mintlify.com/holzerjm/civichacks-demo/llms.txt
Use this file to discover all available pages before exploring further.
System requirements
Before you begin, verify your system meets these requirements:| Component | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16 GB |
| GPU VRAM | Not required (CPU works) | 8+ GB (much faster) |
| Storage | 10 GB free | 20 GB free |
| CPU | Any modern 4-core | Apple Silicon or recent Intel/AMD |
| Python | 3.10+ | 3.12+ |
Apple Silicon Macs (M1/M2/M3/M4) are ideal—the unified memory architecture handles Llama 3.1 8B beautifully at 15-25 tokens/second.
Installation steps
Install Ollama
Ollama is the local LLM runtime that hosts models on your machine.
macOS
Linux
Windows
Download the installer from ollama.comOllama runs as a background service after installation. If it’s not running, start it with
ollama serve.Pull the Llama 3.1 model
Download the model (this is a one-time download of ~4.7 GB):Verify it works:You should see the model generate a brief greeting.
Set up Python virtual environment
Create an isolated Python environment:
You’ll need to activate the virtual environment every time you open a new terminal session.
Install Python dependencies
Upgrade pip and install all required packages:This installs five packages:
| Package | Version | Purpose |
|---|---|---|
llama-index | >=0.12.0 | Core RAG framework for connecting AI to data |
llama-index-llms-ollama | >=0.5.0 | Ollama integration for local LLM inference |
llama-index-embeddings-huggingface | >=0.5.0 | Local embedding model (no API key needed) |
llama-index-readers-file | >=0.5.0 | File readers for PDF, DOCX, and other formats |
gradio | >=5.0.0 | Web UI framework for building the chat interface |
Verify installation
Run these checks to ensure everything is working:Check Ollama is running
llama3.1 in the list.
Check Python packages
Run a quick test
Project structure
After cloning, your directory will look like this:Understanding the dependencies
llama-index
The core RAG (Retrieval Augmented Generation) framework. Handles:- Loading and chunking documents
- Building vector indexes
- Retrieving relevant context for queries
- Orchestrating LLM + data pipelines
llama-index-llms-ollama
Integration layer between LlamaIndex and Ollama. Lets you use local Ollama models as the LLM backend.llama-index-embeddings-huggingface
Provides local embedding models from HuggingFace. The demo usesall-MiniLM-L6-v2, which:
- Runs on CPU (no GPU required)
- Downloads ~80 MB on first use
- Converts text into vectors for semantic search
- Works completely offline after download
llama-index-readers-file
File readers for multiple formats:.txt(plain text).pdf(PDF documents).csv(CSV spreadsheets).docx(Word documents)
gradio
Web UI framework for building the chat interface. Provides:- Chat components
- File upload
- Dropdown selectors
- Example buttons
- Theming and CSS customization
Alternative models
You can swap Llama 3.1 for other models based on your hardware:Smaller/faster models
Larger/better models
Best for reasoning
Troubleshooting
Ollama issues
Ollama isn't responding
Ollama isn't responding
Symptom: Scripts hang or show connection errorsSolution:
Model not found
Model not found
Symptom: Error message says model doesn’t existSolution:
Connection refused
Connection refused
Symptom:
Connection refused to localhost:11434Solution:- Ensure no firewall is blocking port 11434
- Check that Ollama is running:
ollama serve - Verify Ollama is listening:
curl http://localhost:11434
Performance issues
Very slow generation
Very slow generation
Symptom: Tokens appear very slowly (< 3 per second)Solution:
- Close other applications to free RAM
- Try a smaller model:
ollama pull llama3.2:3b - Check Activity Monitor/Task Manager for memory usage
- CPU-only inference is inherently slower—expect 3-8 tok/sec
Index building is slow
Index building is slow
Symptom: Step 2 hangs at “Building vector index”Solution:
- First run downloads the embedding model (~80 MB)
- Check internet connection
- Subsequent runs use cache and are instant
- Cached location:
~/.cache/huggingface/hub/
App is unresponsive
App is unresponsive
Symptom: Gradio app freezes or crashesSolution:
- Ensure at least 8 GB RAM is available
- Llama 3.1 8B uses ~4-5 GB when loaded
- Close browser tabs and other apps
- Restart the app
Python issues
Module not found
Module not found
Symptom:
ModuleNotFoundError: No module named 'llama_index'Solution:Wrong Python version
Wrong Python version
Symptom:
SyntaxError or version mismatchSolution:Gradio issues
Browser doesn't open
Browser doesn't open
Symptom: Script runs but browser doesn’t launchSolution:
- Navigate manually to
http://localhost:7860 - Or set browser env var:
BROWSER=chrome python scripts/demo_step3_app.py
Port already in use
Port already in use
Symptom:
OSError: [Errno 48] Address already in useSolution:Hardware performance guide
Apple Silicon (M1/M2/M3/M4)
- Expected speed: 15-25 tokens/second
- Memory usage: ~5-6 GB for Llama 3.1 8B
- Recommendation: Ideal for this demo
Intel/AMD with GPU
- Expected speed: 10-20 tokens/second (with 8+ GB VRAM)
- Memory usage: ~4-5 GB GPU VRAM
- Recommendation: Excellent performance
Intel/AMD CPU only
- Expected speed: 3-8 tokens/second
- Memory usage: ~8-10 GB RAM
- Recommendation: Slower but still usable for demos
Memory requirements by model
| Model | Size | RAM Needed | Speed (Apple Silicon) |
|---|---|---|---|
llama3.2:3b | ~3B params | 4-6 GB | 25-35 tok/sec |
llama3.1 (8B) | ~8B params | 8-10 GB | 15-25 tok/sec |
llama3.1:70b | ~70B params | 40+ GB | 3-5 tok/sec |
phi3:mini | ~3.8B params | 4-6 GB | 20-30 tok/sec |
deepseek-r1:7b | ~7B params | 8-10 GB | 12-20 tok/sec |
Pre-demo checklist
Before presenting or running at an event:- Ollama is running (
ollama listshowsllama3.1) - All pre-warm commands have been run
- Embedding model is cached (~80 MB, check
~/.cache/huggingface/hub/) - Terminal font size is large enough for the back row
- Other applications are closed to free memory
- Virtual environment is activated
- All scripts have been tested at least once
Next steps
Run the quickstart
Get the demo running in 10 minutes
Understand the architecture
Learn how the RAG pipeline works
Explore the datasets
See what civic data is included
Customize your app
Change prompts, add tracks, and deploy