Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/holzerjm/civichacks-demo/llms.txt

Use this file to discover all available pages before exploring further.

The EcoHack dataset (ecohack_boston_environment.txt) is a synthetic environmental quality report for Boston, Q3 2025. It contains detailed metrics on air quality, urban heat islands, water quality, waste management, climate resilience, and environmental justice impacts.

Dataset overview

File: data/ecohack_boston_environment.txt
Scope: City of Boston Environmental Quality Report, Q3 2025
Size: ~20 lines, 8,100+ characters
Format: Plain text, structured report

Data sections

Air quality summary

  • Boston’s AQI averaged 42 (Good) during Q3 2025, a 7% improvement over Q3 2024
  • 12 days exceeded the “Moderate” threshold (AQI > 100), concentrated in July heat wave
  • Roxbury monitoring station: 15.3 µg/m³ PM2.5 (28% above citywide average)
  • East Boston: 22 ppb NO2 near airport corridor (citywide avg: 14 ppb)
The data explicitly shows environmental justice communities face disproportionate air quality impacts.

Sample queries

Here are the three built-in queries for the EcoHack track:
python scripts/demo_step2_rag.py eco 1

Query 1: Which Boston neighborhoods have the worst air quality and why?

This query surfaces:
  • Roxbury’s elevated PM2.5 levels (15.3 µg/m³)
  • East Boston’s NO2 concentrations near the airport
  • Heat wave impacts in July (12 days exceeding Moderate AQI)
  • The 28% disparity between Roxbury and citywide averages

Query 2: What are the biggest environmental justice concerns in this data?

This query identifies:
  • 35% more poor air quality days in EJ communities
  • 2.4x higher childhood asthma rates
  • 40% less green space per capita
  • 3.1x proximity to contaminated sites
  • The $12M FY2025 funding increase

Query 3: How is climate change specifically threatening Boston’s coastline?

This query highlights:
  • 1.5 feet of sea level rise by 2050
  • $80 billion in at-risk property
  • Affected areas: East Boston, Seaport District, Dorchester waterfront
  • 47 miles of planned coastal adaptation (8% complete)
  • 3 of 12 resilience projects finished

Key metrics reference

MetricValueBenchmark
Citywide AQI42 (Good)7% improvement YoY
Days exceeding Moderate12Concentrated in July
Roxbury PM2.515.3 µg/m³28% above citywide
East Boston NO222 ppb57% above citywide (14 ppb)
NeighborhoodCanopy %Temperature impact
East Boston9%8-12°F hotter
Dorchester11%8-12°F hotter
Mattapan13%8-12°F hotter
30%+ canopy areas30%+Baseline
City goal (2035)35%Requires 120K new trees
ImpactEJ community multiplier
Poor air quality days+35%
Childhood asthma hospitalization2.4x
Green space access-40% (less per capita)
Proximity to contaminated sites3.1x
FY2025 targeted funding$12M (+60% YoY)

Using this dataset in the web app

When you select EcoHack in the Gradio app (Step 3), the interface displays:
  • Header: ”🌿 EcoHack — Analyze Boston environmental quality, climate resilience, and environmental justice data”
  • Example questions: All three queries above as clickable examples
  • Chat responses: AI answers grounded in the specific metrics from this dataset
Try asking follow-up questions like “Which neighborhoods need the most tree planting?” or “What’s causing the CSO increase?” to explore the data interactively.

Querying from code

Here’s how the RAG pipeline loads and queries this dataset:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# Configure local AI
Settings.llm = Ollama(model="llama3.1", request_timeout=120.0)
Settings.embed_model = HuggingFaceEmbedding(model_name="all-MiniLM-L6-v2")

# Load the EcoHack dataset
data_file = "data/ecohack_boston_environment.txt"
documents = SimpleDirectoryReader(input_files=[data_file]).load_data()

# Build vector index
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(streaming=True, similarity_top_k=3)

# Query
response = query_engine.query(
    "Which Boston neighborhoods have the worst air quality and why?"
)
response.print_response_stream()
See Step 2: RAG with civic data for the full implementation.

Build docs developers (and LLMs) love