CityHack dataset

The CityHack dataset (cityhack_boston_311.txt) is a synthetic 311 service request analysis for Boston, 2025. It contains detailed metrics on service volumes, resolution times, geographic equity gaps, staffing challenges, language access, and open issues.

Dataset overview

File: data/cityhack_boston_311.txt
Scope: City of Boston 311 Service Request Analysis, 2025
Size: ~31 lines, 13,800+ characters
Format: Plain text, structured report

Data sections

Overview
Top categories
Geographic disparities
Staffing & language
Open issues

System-wide metrics

487,293 service requests processed in 2025 (+12% YoY)
Average resolution time: 8.4 days (up from 7.1 days in 2024)
Customer satisfaction: 3.2/5 stars (down from 3.5 in 2024)

Request channels:

Phone: 38%
Mobile app: 34%
Web portal: 22%
Twitter/social media: 6%

Top 5 request categories

Category	Volume	Avg resolution	Key insight
Street Cleaning & Litter	67,421	3.2 days	45% from Dorchester, Roxbury, Allston-Brighton
Pothole Repair	52,388	11.7 days	23% duplicate reports, Feb-Mar backlog
Code Enforcement	48,102	22.3 days	400% spike Sept 1 & May 31 in Allston-Brighton
Street Light Outages	41,756	14.8 days	Mattapan/Dorchester: 21+ days avg
Tree Maintenance	38,291	45.2 days	Emergency: 2.1 days, routine: 6+ months backlog

Resolution times by neighborhood

High-income neighborhoods

Neighborhood	Avg resolution	Satisfaction
Back Bay / Beacon Hill	4.2 days	4.1/5
South Boston Waterfront	5.8 days	3.8/5

Lower-income neighborhoods

Neighborhood	Avg resolution	Satisfaction
Mattapan	14.1 days	2.4/5
Roxbury	12.4 days	2.7/5
Dorchester (Fields Corner)	11.8 days	2.9/5

Key disparity: Neighborhoods with median income <

45K experience resolution times **2.8x longer** than neighborhoods with median income &gt;

100K.

Sample queries

Here are the three built-in queries for the CityHack track:

python scripts/demo_step2_rag.py city 1

Query 1: Which neighborhoods have the longest 311 response times and what are the equity implications?

This query surfaces:

Mattapan: 14.1 days (vs. 4.2 in Back Bay)
Roxbury: 12.4 days
Dorchester: 11.8 days
Income correlation: 2.8x longer times in <$45K income areas
Satisfaction scores: 2.4-2.9 vs. 4.1 in high-income areas

Query 2: What are the biggest service gaps for non-English speaking residents?

This query identifies:

32% longer resolution times for non-English requests
Translation only available for phone (not app/web)
18% lower resolution rate in non-English areas
Spanish (8.2%), Haitian Creole (4.1%), Chinese (2.8%) top languages
Unfunded multilingual chatbot request

Query 3: What patterns suggest systemic inequity in city service delivery?

This query highlights:

2.8x resolution time gap by income
Mattapan street light outages: 21+ days (vs. citywide 14.8)
45% of street cleaning requests from 3 neighborhoods (Dorchester, Roxbury, Allston-Brighton)
Non-English areas: 18% lower resolution rate
22% call abandonment rate (up from 15%)

Key metrics reference

Resolution time disparities

Metric	High-income areas	Low-income areas	Multiplier
Avg resolution	4.2-5.8 days	11.8-14.1 days	2.8x
Customer satisfaction	3.8-4.1/5	2.4-2.9/5	-31% to -42%
Income threshold	>$100K median	<$45K median	—

Request volumes by category

Category	2025 requests	Avg resolution	Notes
Street Cleaning	67,421	3.2 days	Peak: Apr-Jun
Pothole Repair	52,388	11.7 days	23% duplicates
Code Enforcement	48,102	22.3 days	Student move spikes
Street Lights	41,756	14.8 days	15% repeat reports
Tree Maintenance	38,291	45.2 days	6+ month routine backlog

Language access metrics

Language	% of requests	Resolution time impact
English	85.6%	Baseline
Spanish	8.2%	+32%
Haitian Creole	4.1%	+32%
Chinese	2.8%	+32%
Vietnamese	1.9%	+32%
Portuguese	1.4%	+32%

Gap: Translation services unavailable on mobile app and web portal (56% of requests).

Staffing and budget

Metric	2024	2025	Change
Full-time agents	—	42	—
Part-time agents	—	15	—
Avg wait time	4.2 min	7.8 min	+86%
Abandonment rate	15%	22%	+47%
Budget	—	$4.8M	-3% (inflation-adjusted)

Using this dataset in the web app

When you select CityHack in the Gradio app (Step 3), the interface displays:

Header: “🏙️ CityHack — Analyze Boston 311 service requests, equity gaps, and service delivery patterns”
Example questions: All three queries above as clickable examples
Chat responses: AI answers grounded in the specific metrics from this dataset

Try asking follow-up questions like “Why does Mattapan have such long wait times?” or “What’s causing the code enforcement spikes?” to explore the data interactively.

Querying from code

Here’s how the RAG pipeline loads and queries this dataset:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# Configure local AI
Settings.llm = Ollama(model="llama3.1", request_timeout=120.0)
Settings.embed_model = HuggingFaceEmbedding(model_name="all-MiniLM-L6-v2")

# Load the CityHack dataset
data_file = "data/cityhack_boston_311.txt"
documents = SimpleDirectoryReader(input_files=[data_file]).load_data()

# Build vector index
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(streaming=True, similarity_top_k=3)

# Query
response = query_engine.query(
    "Which neighborhoods have the longest 311 response times and what are the equity implications?"
)
response.print_response_stream()

See Step 2: RAG with civic data for the full implementation.

Getting Started

Tutorial Steps

Civic Data

Customization

Reference

Dataset overview

Data sections

System-wide metrics

Top 5 request categories

Resolution times by neighborhood

Staffing challenges

Language access

Backlog as of Dec 31, 2025

Sample queries

Query 1: Which neighborhoods have the longest 311 response times and what are the equity implications?

Query 2: What are the biggest service gaps for non-English speaking residents?

Query 3: What patterns suggest systemic inequity in city service delivery?

Key metrics reference

Using this dataset in the web app

Querying from code

Build docs developers (and LLMs) love

Getting Started

Tutorial Steps

Civic Data

Customization

Reference

Documentation Index

​Dataset overview

​Data sections

​System-wide metrics

​Top 5 request categories

​Resolution times by neighborhood

​Staffing challenges

​Language access

​Backlog as of Dec 31, 2025

​Sample queries

​Query 1: Which neighborhoods have the longest 311 response times and what are the equity implications?

​Query 2: What are the biggest service gaps for non-English speaking residents?

​Query 3: What patterns suggest systemic inequity in city service delivery?

​Key metrics reference

​Using this dataset in the web app

​Querying from code

Build docs developers (and LLMs) love

Dataset overview

Data sections

System-wide metrics

Top 5 request categories

Resolution times by neighborhood

Staffing challenges

Language access

Backlog as of Dec 31, 2025

Sample queries

Query 1: Which neighborhoods have the longest 311 response times and what are the equity implications?

Query 2: What are the biggest service gaps for non-English speaking residents?

Query 3: What patterns suggest systemic inequity in city service delivery?

Key metrics reference

Using this dataset in the web app

Querying from code