Documentation Index
Fetch the complete documentation index at: https://mintlify.com/holzerjm/civichacks-demo/llms.txt
Use this file to discover all available pages before exploring further.
Overview
This step connects the local AI to real civic data using Retrieval Augmented Generation (RAG). In ~15 lines of code, you can query real civic documents with a local AI — no APIs, no cost. Duration: ~90 seconds What you’ll learn:- How to load civic datasets and build a vector search index
- How RAG grounds AI responses in actual data
- How to query across different civic data tracks
Prerequisites
Complete Step 1: Local AI with Ollama first, then install the RAG dependencies:Available tracks
The demo includes four civic data tracks:| Track | Key | Data file | Focus |
|---|---|---|---|
| 🌿 EcoHack | eco | ecohack_boston_environment.txt | Air quality, heat islands, climate resilience |
| 🏙️ CityHack | city | cityhack_boston_311.txt | 311 service requests, equity gaps |
| 📚 EduHack | edu | eduhack_boston_schools.txt | Achievement gaps, absenteeism, tech access |
| ⚖️ JusticeHack | justice | justicehack_ma_justice.txt | Incarceration disparities, policing data |
All datasets are synthetic but realistic — fabricated for demonstration purposes using real-world patterns.
Running the demo
Basic usage
Specific question
Each track has 3 pre-written questions numbered 1-3:All questions
Command-line options
| Option | Description |
|---|---|
track | Hackathon track to query: eco, city, edu, justice (default: city) |
question | Question number to ask (1-3). If omitted, picks a random question |
--all | Run all 3 sample questions for the track |
Sample questions by track
EcoHack
- Which Boston neighborhoods have the worst air quality and why?
- What are the biggest environmental justice concerns in this data?
- How is climate change specifically threatening Boston’s coastline?
CityHack
- Which neighborhoods have the longest 311 response times and what are the equity implications?
- What are the biggest service gaps for non-English speaking residents?
- What patterns suggest systemic inequity in city service delivery?
EduHack
- What are the most significant achievement gaps in Boston public schools?
- How does transportation affect student attendance and outcomes?
- What technology access barriers exist for students and teachers?
JusticeHack
- What racial disparities exist in pretrial detention in Massachusetts?
- How effective are reentry programs at reducing recidivism?
- What does the data reveal about policing patterns in Boston?
Expected output
How it works
The RAG pipeline performs these steps:Configure the AI stack
Sets up the local LLM and embedding model:
The embedding model (~80 MB) downloads on first use and is cached in
~/.cache/huggingface/hub/Build the vector index
Chunks the text, computes embeddings, and builds a searchable vector index in memory:This is the “RAG magic” — the index enables semantic search across the data.
Data flow diagram
Performance tips
Customizing the data
To use your own civic data:- Add a
.txtfile to thedata/directory - Update the
TRACKSdictionary inscripts/demo_step2_rag.py:
- Run it:
Troubleshooting
Error: embeddings.position_ids UNEXPECTED
Error: embeddings.position_ids UNEXPECTED
This is a harmless warning from the HuggingFace model. The script suppresses it with:
Index building is slow
Index building is slow
First run downloads the embedding model (~80 MB). Subsequent runs use the cache and are faster.
Response doesn't cite data
Response doesn't cite data
Increase
similarity_top_k to retrieve more chunks:No module named 'llama_index'
No module named 'llama_index'
Install the dependencies: