EcoHack dataset

The EcoHack dataset (ecohack_boston_environment.txt) is a synthetic environmental quality report for Boston, Q3 2025. It contains detailed metrics on air quality, urban heat islands, water quality, waste management, climate resilience, and environmental justice impacts.

Dataset overview

File: data/ecohack_boston_environment.txt
Scope: City of Boston Environmental Quality Report, Q3 2025
Size: ~20 lines, 8,100+ characters
Format: Plain text, structured report

Data sections

Air quality
Heat islands
Water quality
Waste & climate
Environmental justice

Air quality summary

Boston’s AQI averaged 42 (Good) during Q3 2025, a 7% improvement over Q3 2024
12 days exceeded the “Moderate” threshold (AQI > 100), concentrated in July heat wave
Roxbury monitoring station: 15.3 µg/m³ PM2.5 (28% above citywide average)
East Boston: 22 ppb NO2 near airport corridor (citywide avg: 14 ppb)

The data explicitly shows environmental justice communities face disproportionate air quality impacts.

Sample queries

Here are the three built-in queries for the EcoHack track:

python scripts/demo_step2_rag.py eco 1

Query 1: Which Boston neighborhoods have the worst air quality and why?

This query surfaces:

Roxbury’s elevated PM2.5 levels (15.3 µg/m³)
East Boston’s NO2 concentrations near the airport
Heat wave impacts in July (12 days exceeding Moderate AQI)
The 28% disparity between Roxbury and citywide averages

Query 2: What are the biggest environmental justice concerns in this data?

This query identifies:

35% more poor air quality days in EJ communities
2.4x higher childhood asthma rates
40% less green space per capita
3.1x proximity to contaminated sites
The $12M FY2025 funding increase

Query 3: How is climate change specifically threatening Boston’s coastline?

This query highlights:

1.5 feet of sea level rise by 2050
$80 billion in at-risk property
Affected areas: East Boston, Seaport District, Dorchester waterfront
47 miles of planned coastal adaptation (8% complete)
3 of 12 resilience projects finished

Key metrics reference

Air quality metrics

Metric	Value	Benchmark
Citywide AQI	42 (Good)	7% improvement YoY
Days exceeding Moderate	12	Concentrated in July
Roxbury PM2.5	15.3 µg/m³	28% above citywide
East Boston NO2	22 ppb	57% above citywide (14 ppb)

Tree canopy coverage

Neighborhood	Canopy %	Temperature impact
East Boston	9%	8-12°F hotter
Dorchester	11%	8-12°F hotter
Mattapan	13%	8-12°F hotter
30%+ canopy areas	30%+	Baseline
City goal (2035)	35%	Requires 120K new trees

Environmental justice disparities

Impact	EJ community multiplier
Poor air quality days	+35%
Childhood asthma hospitalization	2.4x
Green space access	-40% (less per capita)
Proximity to contaminated sites	3.1x
FY2025 targeted funding	$12M (+60% YoY)

Using this dataset in the web app

When you select EcoHack in the Gradio app (Step 3), the interface displays:

Header: ”🌿 EcoHack — Analyze Boston environmental quality, climate resilience, and environmental justice data”
Example questions: All three queries above as clickable examples
Chat responses: AI answers grounded in the specific metrics from this dataset

Try asking follow-up questions like “Which neighborhoods need the most tree planting?” or “What’s causing the CSO increase?” to explore the data interactively.

Querying from code

Here’s how the RAG pipeline loads and queries this dataset:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# Configure local AI
Settings.llm = Ollama(model="llama3.1", request_timeout=120.0)
Settings.embed_model = HuggingFaceEmbedding(model_name="all-MiniLM-L6-v2")

# Load the EcoHack dataset
data_file = "data/ecohack_boston_environment.txt"
documents = SimpleDirectoryReader(input_files=[data_file]).load_data()

# Build vector index
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(streaming=True, similarity_top_k=3)

# Query
response = query_engine.query(
    "Which Boston neighborhoods have the worst air quality and why?"
)
response.print_response_stream()

See Step 2: RAG with civic data for the full implementation.

Getting Started

Tutorial Steps

Civic Data

Customization

Reference

Dataset overview

Data sections

Air quality summary

Urban heat island effect

Water quality and CSO events

Waste, recycling, and climate

Environmental justice metrics

Sample queries

Query 1: Which Boston neighborhoods have the worst air quality and why?

Query 2: What are the biggest environmental justice concerns in this data?

Query 3: How is climate change specifically threatening Boston’s coastline?

Key metrics reference

Using this dataset in the web app

Querying from code

Build docs developers (and LLMs) love

Getting Started

Tutorial Steps

Civic Data

Customization

Reference

Documentation Index

​Dataset overview

​Data sections

​Air quality summary

​Urban heat island effect

​Water quality and CSO events

​Waste, recycling, and climate

​Environmental justice metrics

​Sample queries

​Query 1: Which Boston neighborhoods have the worst air quality and why?

​Query 2: What are the biggest environmental justice concerns in this data?

​Query 3: How is climate change specifically threatening Boston’s coastline?

​Key metrics reference

​Using this dataset in the web app

​Querying from code

Build docs developers (and LLMs) love

Dataset overview

Data sections

Air quality summary

Urban heat island effect

Water quality and CSO events

Waste, recycling, and climate

Environmental justice metrics

Sample queries

Query 1: Which Boston neighborhoods have the worst air quality and why?

Query 2: What are the biggest environmental justice concerns in this data?

Query 3: How is climate change specifically threatening Boston’s coastline?

Key metrics reference

Using this dataset in the web app

Querying from code