Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/holzerjm/civichacks-demo/llms.txt

Use this file to discover all available pages before exploring further.

The EduHack dataset (eduhack_boston_schools.txt) is a synthetic educational equity report for Boston Public Schools, 2024-2025. It contains detailed metrics on enrollment demographics, MCAS achievement gaps, chronic absenteeism, technology access, teacher staffing, and college readiness.

Dataset overview

File: data/eduhack_boston_schools.txt
Scope: Boston Public Schools Educational Equity Report, 2024-2025
Size: ~51 lines, 22,000+ characters
Format: Plain text, structured report

Data sections

District overview

  • 49,152 students across 125 schools
  • 78 home languages represented
Demographics:
  • Hispanic/Latino: 43.8%
  • Black: 29.1%
  • White: 14.2%
  • Asian: 8.9%
  • Multiracial: 3.2%
Special populations:
  • Low-income families: 62%
  • English Language Learners: 32.4%
  • Special education services: 21.7%
  • Experiencing homelessness: 4.8% (2,359 students)

Sample queries

Here are the three built-in queries for the EduHack track:
python scripts/demo_step2_rag.py edu 1

Query 1: What are the most significant achievement gaps in Boston public schools?

This query surfaces:
  • Math: White-Black gap of 38 percentage points (52% vs. 14%)
  • ELA: White-Black gap of 41 percentage points (63% vs. 22%)
  • ELL students: 8% math proficiency, 5% ELA proficiency
  • Low-income vs. non-low-income: 34-point math gap, 34-point ELA gap
  • No improvement in 5 years for White-Black math gap

Query 2: How does transportation affect student attendance and outcomes?

This query identifies:
  • Students with 45+ minute commutes: 2.3x more likely to be chronically absent
  • District-wide chronic absenteeism: 38.2%
  • High schools: 47.3% chronic absenteeism rate
  • Strongest predictor of absenteeism in the data
  • Superintendent recommendation to expand transportation options

Query 3: What technology access barriers exist for students and teachers?

This query highlights:
  • 18% unreliable home internet, 7% no internet at all
  • 3,400 eligible families still unconnected despite Comcast partnership
  • 34% of 6th graders lack digital literacy to navigate educational software
  • 45% of teachers feel unprepared to integrate AI tools
  • 1:1 device ratio achieved but connectivity gaps remain

Key metrics reference

Student groupProficiencyGap vs. WhiteGap vs. Overall
Asian58%+6 pts+30 pts
White52%Baseline+24 pts
Overall28%Baseline
Low-income17%-35 pts-11 pts
Hispanic/Latino16%-36 pts-12 pts
Black14%-38 pts-14 pts
ELL8%-44 pts-20 pts
Students w/ disabilities6%-46 pts-22 pts
GroupRateNotes
Students experiencing homelessness62%Highest impact
High schools (avg)47.3%5 schools >55%
Black students44%
Hispanic/Latino students41%
District-wide38.2%Up from 33% pre-pandemic
White students28%
Asian students22%Lowest rate
45+ min commute2.3xStrongest predictor
MetricValueContext
Device ratio1:1One Chromebook per student
Unreliable home internet18%
No home internet7%
Comcast Essentials enrolled8,200 familiesPartnership program
Eligible but unconnected3,400 familiesGap remains
6th graders lacking digital literacy34%Cannot navigate software
Teachers unprepared for AI tools45%Professional development gap
MetricValueContext
Total vacancies (SY2024-25)1275.2% vacancy rate
Special Education vacancies42Highest need area
Math vacancies28
Science vacancies22
Bilingual Education vacancies19
Students of color64%Black + Hispanic/Latino
Teachers of color39%Demographic mismatch
Roxbury turnover28%Highest in district
Mattapan turnover25%Second highest

Using this dataset in the web app

When you select EduHack in the Gradio app (Step 3), the interface displays:
  • Header: ”📚 EduHack — Analyze Boston public schools equity, achievement gaps, and student outcomes data”
  • Example questions: All three queries above as clickable examples
  • Chat responses: AI answers grounded in the specific metrics from this dataset
Try asking follow-up questions like “Which schools have the highest absenteeism?” or “What interventions does the superintendent recommend?” to explore the data interactively.

Querying from code

Here’s how the RAG pipeline loads and queries this dataset:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# Configure local AI
Settings.llm = Ollama(model="llama3.1", request_timeout=120.0)
Settings.embed_model = HuggingFaceEmbedding(model_name="all-MiniLM-L6-v2")

# Load the EduHack dataset
data_file = "data/eduhack_boston_schools.txt"
documents = SimpleDirectoryReader(input_files=[data_file]).load_data()

# Build vector index
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(streaming=True, similarity_top_k=3)

# Query
response = query_engine.query(
    "What are the most significant achievement gaps in Boston public schools?"
)
response.print_response_stream()
See Step 2: RAG with civic data for the full implementation.

Build docs developers (and LLMs) love