The CivicHacks Demo includes four synthetic but realistic civic datasets that demonstrate how open source AI can analyze real-world municipal data. Each dataset is designed to support a specific hackathon track and contains rich, structured information perfect for RAG-based queries.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/holzerjm/civichacks-demo/llms.txt
Use this file to discover all available pages before exploring further.
Available datasets
EcoHack: Boston environment
Air quality, heat islands, water quality, climate resilience, and environmental justice metrics
CityHack: Boston 311 services
Service request analysis, geographic disparities, staffing, language access, and equity gaps
EduHack: Boston public schools
Achievement gaps, chronic absenteeism, technology access, staffing, and college readiness
JusticeHack: MA criminal justice
Incarceration disparities, pretrial detention, recidivism, policing data, and legal representation
Data characteristics
All datasets share these properties:- Synthetic but realistic: Fabricated for demonstration, but based on real-world patterns and plausible statistics
- Plain text format: Stored as
.txtfiles for easy loading and indexing - Rich context: Each contains specific neighborhoods, demographics, metrics, and equity considerations
- RAG-optimized: Structured to work well with retrieval augmented generation queries
- Civic focus: Designed to surface actionable insights about municipal services and social equity
Using the datasets
Each dataset can be queried using the RAG pipeline in Step 2:Sample queries across datasets
Environment queries
Environment queries
- Which Boston neighborhoods have the worst air quality and why?
- What are the biggest environmental justice concerns in this data?
- How is climate change specifically threatening Boston’s coastline?
311 services queries
311 services queries
- Which neighborhoods have the longest 311 response times and what are the equity implications?
- What are the biggest service gaps for non-English speaking residents?
- What patterns suggest systemic inequity in city service delivery?
Education queries
Education queries
- What are the most significant achievement gaps in Boston public schools?
- How does transportation affect student attendance and outcomes?
- What technology access barriers exist for students and teachers?
Criminal justice queries
Criminal justice queries
- What racial disparities exist in pretrial detention in Massachusetts?
- How effective are reentry programs at reducing recidivism?
- What does the data reveal about policing patterns in Boston?
Data location
All datasets are located in thedata/ directory:
Bringing your own data
While these datasets are designed for the demo, Steps 4 and 5 let you plug in your own civic data:- Drop
.txt,.pdf,.csv, or.docxfiles into theuserdata/directory - Run
demo_step4_byod.pyfor interactive terminal-based exploration - Run
demo_step5_byod_app.pyfor a web-based drag-and-drop interface
Finding real civic data
If you want to replace these synthetic datasets with real municipal data:| Source | URL | Coverage |
|---|---|---|
| City of Boston Open Data | https://data.boston.gov | Boston-specific datasets |
| Massachusetts Open Data | https://mass.gov/open-data | Statewide data |
| Federal Open Data | https://data.gov | US federal datasets |
Real civic datasets often require cleaning, normalization, and format conversion before they work well with RAG pipelines. The synthetic datasets in this demo are pre-formatted for optimal AI query performance.