Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/holzerjm/civichacks-demo/llms.txt

Use this file to discover all available pages before exploring further.

System requirements

Before you begin, verify your system meets these requirements:
ComponentMinimumRecommended
RAM8 GB16 GB
GPU VRAMNot required (CPU works)8+ GB (much faster)
Storage10 GB free20 GB free
CPUAny modern 4-coreApple Silicon or recent Intel/AMD
Python3.10+3.12+
Apple Silicon Macs (M1/M2/M3/M4) are ideal—the unified memory architecture handles Llama 3.1 8B beautifully at 15-25 tokens/second.
No GPU? It still works on CPU, just slower (~3-5 tokens/second). For live demos, this is actually fine—the audience can see it generating in real time.

Installation steps

1

Install Ollama

Ollama is the local LLM runtime that hosts models on your machine.

macOS

brew install ollama

Linux

curl -fsSL https://ollama.com/install.sh | sh

Windows

Download the installer from ollama.com
Ollama runs as a background service after installation. If it’s not running, start it with ollama serve.
2

Pull the Llama 3.1 model

Download the model (this is a one-time download of ~4.7 GB):
ollama pull llama3.1
Verify it works:
ollama run llama3.1 "Say hello in 10 words or less"
You should see the model generate a brief greeting.
Do this on reliable wifi BEFORE the event. The model is ~4.7 GB. Once downloaded, it runs offline forever.
3

Clone the repository

Get the demo code:
git clone https://github.com/holzerjm/civichacks-demo.git
cd civichacks-demo
4

Set up Python virtual environment

Create an isolated Python environment:
python3 -m venv .venv
source .venv/bin/activate
You’ll need to activate the virtual environment every time you open a new terminal session.
5

Install Python dependencies

Upgrade pip and install all required packages:
pip install --upgrade pip
pip install -r requirements.txt
This installs five packages:
PackageVersionPurpose
llama-index>=0.12.0Core RAG framework for connecting AI to data
llama-index-llms-ollama>=0.5.0Ollama integration for local LLM inference
llama-index-embeddings-huggingface>=0.5.0Local embedding model (no API key needed)
llama-index-readers-file>=0.5.0File readers for PDF, DOCX, and other formats
gradio>=5.0.0Web UI framework for building the chat interface
6

Pre-warm everything (critical for live demos)

First runs are slower because models need to load into memory and the embedding model (~80 MB) downloads on first use.Run each step once before presenting:
# Pre-warm Step 1
python scripts/demo_step1_ollama.py

# Pre-warm Step 2 (each track)
python scripts/demo_step2_rag.py eco
python scripts/demo_step2_rag.py city
python scripts/demo_step2_rag.py edu
python scripts/demo_step2_rag.py justice

# Pre-warm Step 3 (start it, verify it loads, then Ctrl+C)
python scripts/demo_step3_app.py
The HuggingFace embedding model (all-MiniLM-L6-v2, ~80 MB) downloads on first use. Running Step 2 once will cache it to ~/.cache/huggingface/hub/.

Verify installation

Run these checks to ensure everything is working:

Check Ollama is running

ollama list
You should see llama3.1 in the list.

Check Python packages

pip list | grep llama-index
pip list | grep gradio
You should see all five packages installed.

Run a quick test

python scripts/demo_step1_ollama.py
If you see the AI generate a response with cost comparison, you’re all set!

Project structure

After cloning, your directory will look like this:
civichacks-demo/
├── README.md                             # Project overview and demo flow
├── USER_GUIDE.md                         # Comprehensive guide
├── requirements.txt                      # Python dependencies
├── data/                                 # Civic datasets (one per track)
│   ├── ecohack_boston_environment.txt     # Boston environmental quality data
│   ├── cityhack_boston_311.txt            # Boston 311 service request data
│   ├── eduhack_boston_schools.txt         # Boston public schools equity data
│   └── justicehack_ma_justice.txt        # MA criminal justice reform data
├── userdata/                             # Drop your own files here for Step 4
└── scripts/                              # Demo scripts (run in order)
    ├── cost_estimator.py                 # Shared: local vs. cloud cost comparison
    ├── demo_step1_ollama.py              # Step 1: Basic local AI inference
    ├── demo_step2_rag.py                 # Step 2: RAG with civic data
    ├── demo_step3_app.py                 # Step 3: Full Gradio web app
    ├── demo_step4_byod.py               # Step 4: Bring Your Own Data (interactive)
    └── demo_step5_byod_app.py           # Step 5: BYOD Web Application (Gradio)

Understanding the dependencies

llama-index

The core RAG (Retrieval Augmented Generation) framework. Handles:
  • Loading and chunking documents
  • Building vector indexes
  • Retrieving relevant context for queries
  • Orchestrating LLM + data pipelines

llama-index-llms-ollama

Integration layer between LlamaIndex and Ollama. Lets you use local Ollama models as the LLM backend.

llama-index-embeddings-huggingface

Provides local embedding models from HuggingFace. The demo uses all-MiniLM-L6-v2, which:
  • Runs on CPU (no GPU required)
  • Downloads ~80 MB on first use
  • Converts text into vectors for semantic search
  • Works completely offline after download

llama-index-readers-file

File readers for multiple formats:
  • .txt (plain text)
  • .pdf (PDF documents)
  • .csv (CSV spreadsheets)
  • .docx (Word documents)

gradio

Web UI framework for building the chat interface. Provides:
  • Chat components
  • File upload
  • Dropdown selectors
  • Example buttons
  • Theming and CSS customization

Alternative models

You can swap Llama 3.1 for other models based on your hardware:

Smaller/faster models

# 3.8B parameters, runs on almost anything
ollama pull phi3:mini

# 3B parameters, very fast
ollama pull llama3.2:3b

Larger/better models

# Needs ~40GB RAM, but incredible quality
ollama pull llama3.1:70b

Best for reasoning

# Strong reasoning, MIT license
ollama pull deepseek-r1:7b
To use a different model, update the model name in the scripts:
Settings.llm = Ollama(model="phi3:mini")  # Change in demo_step2_rag.py and demo_step3_app.py
For Step 1:
stream = ollama.chat(model="phi3:mini", ...)  # Change in demo_step1_ollama.py

Troubleshooting

Ollama issues

Symptom: Scripts hang or show connection errorsSolution:
# Check if Ollama is running
ollama list

# If not, start it
ollama serve
Symptom: Error message says model doesn’t existSolution:
# Pull the model
ollama pull llama3.1

# Verify it's installed
ollama list
Symptom: Connection refused to localhost:11434Solution:
  • Ensure no firewall is blocking port 11434
  • Check that Ollama is running: ollama serve
  • Verify Ollama is listening: curl http://localhost:11434

Performance issues

Symptom: Tokens appear very slowly (< 3 per second)Solution:
  • Close other applications to free RAM
  • Try a smaller model: ollama pull llama3.2:3b
  • Check Activity Monitor/Task Manager for memory usage
  • CPU-only inference is inherently slower—expect 3-8 tok/sec
Symptom: Step 2 hangs at “Building vector index”Solution:
  • First run downloads the embedding model (~80 MB)
  • Check internet connection
  • Subsequent runs use cache and are instant
  • Cached location: ~/.cache/huggingface/hub/
Symptom: Gradio app freezes or crashesSolution:
  • Ensure at least 8 GB RAM is available
  • Llama 3.1 8B uses ~4-5 GB when loaded
  • Close browser tabs and other apps
  • Restart the app

Python issues

Symptom: ModuleNotFoundError: No module named 'llama_index'Solution:
# Ensure virtual environment is activated
source .venv/bin/activate  # macOS/Linux
.venv\Scripts\activate    # Windows

# Reinstall dependencies
pip install -r requirements.txt
Symptom: SyntaxError or version mismatchSolution:
# Check Python version
python3 --version

# Must be 3.10 or higher
# If not, install Python 3.10+ and recreate venv

Gradio issues

Symptom: Script runs but browser doesn’t launchSolution:
  • Navigate manually to http://localhost:7860
  • Or set browser env var: BROWSER=chrome python scripts/demo_step3_app.py
Symptom: OSError: [Errno 48] Address already in useSolution:
# Find and kill the process using port 7860
lsof -i :7860  # macOS/Linux
netstat -ano | findstr :7860  # Windows

# Or use a different port
python scripts/demo_step3_app.py --port 8080

Hardware performance guide

Apple Silicon (M1/M2/M3/M4)

  • Expected speed: 15-25 tokens/second
  • Memory usage: ~5-6 GB for Llama 3.1 8B
  • Recommendation: Ideal for this demo

Intel/AMD with GPU

  • Expected speed: 10-20 tokens/second (with 8+ GB VRAM)
  • Memory usage: ~4-5 GB GPU VRAM
  • Recommendation: Excellent performance

Intel/AMD CPU only

  • Expected speed: 3-8 tokens/second
  • Memory usage: ~8-10 GB RAM
  • Recommendation: Slower but still usable for demos

Memory requirements by model

ModelSizeRAM NeededSpeed (Apple Silicon)
llama3.2:3b~3B params4-6 GB25-35 tok/sec
llama3.1 (8B)~8B params8-10 GB15-25 tok/sec
llama3.1:70b~70B params40+ GB3-5 tok/sec
phi3:mini~3.8B params4-6 GB20-30 tok/sec
deepseek-r1:7b~7B params8-10 GB12-20 tok/sec

Pre-demo checklist

Before presenting or running at an event:
  • Ollama is running (ollama list shows llama3.1)
  • All pre-warm commands have been run
  • Embedding model is cached (~80 MB, check ~/.cache/huggingface/hub/)
  • Terminal font size is large enough for the back row
  • Other applications are closed to free memory
  • Virtual environment is activated
  • All scripts have been tested at least once
For live demos: Have a backup screen recording ready in case of hardware failure. Record it at the venue so the environment looks authentic.

Next steps

Run the quickstart

Get the demo running in 10 minutes

Understand the architecture

Learn how the RAG pipeline works

Explore the datasets

See what civic data is included

Customize your app

Change prompts, add tracks, and deploy

Build docs developers (and LLMs) love