Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/holzerjm/civichacks-demo/llms.txt

Use this file to discover all available pages before exploring further.

The CivicHacks Demo includes four synthetic but realistic civic datasets that demonstrate how open source AI can analyze real-world municipal data. Each dataset is designed to support a specific hackathon track and contains rich, structured information perfect for RAG-based queries.

Available datasets

EcoHack: Boston environment

Air quality, heat islands, water quality, climate resilience, and environmental justice metrics

CityHack: Boston 311 services

Service request analysis, geographic disparities, staffing, language access, and equity gaps

EduHack: Boston public schools

Achievement gaps, chronic absenteeism, technology access, staffing, and college readiness

JusticeHack: MA criminal justice

Incarceration disparities, pretrial detention, recidivism, policing data, and legal representation

Data characteristics

All datasets share these properties:
  • Synthetic but realistic: Fabricated for demonstration, but based on real-world patterns and plausible statistics
  • Plain text format: Stored as .txt files for easy loading and indexing
  • Rich context: Each contains specific neighborhoods, demographics, metrics, and equity considerations
  • RAG-optimized: Structured to work well with retrieval augmented generation queries
  • Civic focus: Designed to surface actionable insights about municipal services and social equity

Using the datasets

Each dataset can be queried using the RAG pipeline in Step 2:
# Query the city track (311 services)
python scripts/demo_step2_rag.py city

# Run all three sample questions for the education track
python scripts/demo_step2_rag.py edu --all

# Ask a specific question (question 2) from the environment track
python scripts/demo_step2_rag.py eco 2
When the Gradio web app launches (Step 3), you can switch between all four datasets using the track selector dropdown. The header, description, and example questions update dynamically.

Sample queries across datasets

  • Which Boston neighborhoods have the worst air quality and why?
  • What are the biggest environmental justice concerns in this data?
  • How is climate change specifically threatening Boston’s coastline?
  • Which neighborhoods have the longest 311 response times and what are the equity implications?
  • What are the biggest service gaps for non-English speaking residents?
  • What patterns suggest systemic inequity in city service delivery?
  • What are the most significant achievement gaps in Boston public schools?
  • How does transportation affect student attendance and outcomes?
  • What technology access barriers exist for students and teachers?
  • What racial disparities exist in pretrial detention in Massachusetts?
  • How effective are reentry programs at reducing recidivism?
  • What does the data reveal about policing patterns in Boston?

Data location

All datasets are located in the data/ directory:
data/
├── ecohack_boston_environment.txt
├── cityhack_boston_311.txt
├── eduhack_boston_schools.txt
└── justicehack_ma_justice.txt

Bringing your own data

While these datasets are designed for the demo, Steps 4 and 5 let you plug in your own civic data:
  • Drop .txt, .pdf, .csv, or .docx files into the userdata/ directory
  • Run demo_step4_byod.py for interactive terminal-based exploration
  • Run demo_step5_byod_app.py for a web-based drag-and-drop interface
See Bring Your Own Data for details.

Finding real civic data

If you want to replace these synthetic datasets with real municipal data:
SourceURLCoverage
City of Boston Open Datahttps://data.boston.govBoston-specific datasets
Massachusetts Open Datahttps://mass.gov/open-dataStatewide data
Federal Open Datahttps://data.govUS federal datasets
Real civic datasets often require cleaning, normalization, and format conversion before they work well with RAG pipelines. The synthetic datasets in this demo are pre-formatted for optimal AI query performance.

Build docs developers (and LLMs) love