Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/holzerjm/civichacks-demo/llms.txt

Use this file to discover all available pages before exploring further.

The CityHack dataset (cityhack_boston_311.txt) is a synthetic 311 service request analysis for Boston, 2025. It contains detailed metrics on service volumes, resolution times, geographic equity gaps, staffing challenges, language access, and open issues.

Dataset overview

File: data/cityhack_boston_311.txt
Scope: City of Boston 311 Service Request Analysis, 2025
Size: ~31 lines, 13,800+ characters
Format: Plain text, structured report

Data sections

System-wide metrics

  • 487,293 service requests processed in 2025 (+12% YoY)
  • Average resolution time: 8.4 days (up from 7.1 days in 2024)
  • Customer satisfaction: 3.2/5 stars (down from 3.5 in 2024)
Request channels:
  • Phone: 38%
  • Mobile app: 34%
  • Web portal: 22%
  • Twitter/social media: 6%

Sample queries

Here are the three built-in queries for the CityHack track:
python scripts/demo_step2_rag.py city 1

Query 1: Which neighborhoods have the longest 311 response times and what are the equity implications?

This query surfaces:
  • Mattapan: 14.1 days (vs. 4.2 in Back Bay)
  • Roxbury: 12.4 days
  • Dorchester: 11.8 days
  • Income correlation: 2.8x longer times in <$45K income areas
  • Satisfaction scores: 2.4-2.9 vs. 4.1 in high-income areas

Query 2: What are the biggest service gaps for non-English speaking residents?

This query identifies:
  • 32% longer resolution times for non-English requests
  • Translation only available for phone (not app/web)
  • 18% lower resolution rate in non-English areas
  • Spanish (8.2%), Haitian Creole (4.1%), Chinese (2.8%) top languages
  • Unfunded multilingual chatbot request

Query 3: What patterns suggest systemic inequity in city service delivery?

This query highlights:
  • 2.8x resolution time gap by income
  • Mattapan street light outages: 21+ days (vs. citywide 14.8)
  • 45% of street cleaning requests from 3 neighborhoods (Dorchester, Roxbury, Allston-Brighton)
  • Non-English areas: 18% lower resolution rate
  • 22% call abandonment rate (up from 15%)

Key metrics reference

MetricHigh-income areasLow-income areasMultiplier
Avg resolution4.2-5.8 days11.8-14.1 days2.8x
Customer satisfaction3.8-4.1/52.4-2.9/5-31% to -42%
Income threshold>$100K median<$45K median
Category2025 requestsAvg resolutionNotes
Street Cleaning67,4213.2 daysPeak: Apr-Jun
Pothole Repair52,38811.7 days23% duplicates
Code Enforcement48,10222.3 daysStudent move spikes
Street Lights41,75614.8 days15% repeat reports
Tree Maintenance38,29145.2 days6+ month routine backlog
Language% of requestsResolution time impact
English85.6%Baseline
Spanish8.2%+32%
Haitian Creole4.1%+32%
Chinese2.8%+32%
Vietnamese1.9%+32%
Portuguese1.4%+32%
Gap: Translation services unavailable on mobile app and web portal (56% of requests).
Metric20242025Change
Full-time agents42
Part-time agents15
Avg wait time4.2 min7.8 min+86%
Abandonment rate15%22%+47%
Budget$4.8M-3% (inflation-adjusted)

Using this dataset in the web app

When you select CityHack in the Gradio app (Step 3), the interface displays:
  • Header: “🏙️ CityHack — Analyze Boston 311 service requests, equity gaps, and service delivery patterns”
  • Example questions: All three queries above as clickable examples
  • Chat responses: AI answers grounded in the specific metrics from this dataset
Try asking follow-up questions like “Why does Mattapan have such long wait times?” or “What’s causing the code enforcement spikes?” to explore the data interactively.

Querying from code

Here’s how the RAG pipeline loads and queries this dataset:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# Configure local AI
Settings.llm = Ollama(model="llama3.1", request_timeout=120.0)
Settings.embed_model = HuggingFaceEmbedding(model_name="all-MiniLM-L6-v2")

# Load the CityHack dataset
data_file = "data/cityhack_boston_311.txt"
documents = SimpleDirectoryReader(input_files=[data_file]).load_data()

# Build vector index
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(streaming=True, similarity_top_k=3)

# Query
response = query_engine.query(
    "Which neighborhoods have the longest 311 response times and what are the equity implications?"
)
response.print_response_stream()
See Step 2: RAG with civic data for the full implementation.

Build docs developers (and LLMs) love