Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/holzerjm/civichacks-demo/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The CivicHacks Demo is built on a four-layer stack that runs entirely locally with zero cloud dependencies. Every component is open source and free to use.

Architecture diagram

┌─────────────────────────────────────────────┐
│              Gradio Web UI (Step 3)         │  ← Browser-based chat interface
├─────────────────────────────────────────────┤
│          LlamaIndex RAG Pipeline (Step 2)   │  ← Retrieval Augmented Generation
│   ┌──────────────┐    ┌──────────────────┐  │
│   │ Vector Index  │    │ HuggingFace      │  │
│   │ (in-memory)   │    │ Embeddings       │  │
│   │               │    │ (all-MiniLM-L6)  │  │
│   └──────────────┘    └──────────────────┘  │
├─────────────────────────────────────────────┤
│            Ollama + Llama 3.1 (Step 1)      │  ← Local LLM inference
├─────────────────────────────────────────────┤
│            Civic Data Files (data/)         │  ← .txt datasets per track
└─────────────────────────────────────────────┘

System components

The browser-based chat interface that provides:
  • Dynamic header that updates when switching tracks
  • Track selector dropdown for all four civic datasets
  • Chat interface with message history
  • Dynamic example questions per track
  • Per-query cost comparison (local vs cloud)
  • Red Hat-themed styling
Location: scripts/demo_step3_app.py
The retrieval augmented generation layer that:
  • Loads documents from the data/ directory
  • Builds in-memory vector indices for fast search
  • Uses HuggingFace embeddings (all-MiniLM-L6-v2, ~80 MB)
  • Retrieves relevant chunks before generating responses
  • Caches indices for instant track switching
Location: scripts/demo_step2_rag.py, scripts/demo_step3_app.py
The local LLM inference engine that:
  • Runs Llama 3.1 8B model locally (4.7 GB download)
  • Provides OpenAI-compatible API on localhost:11434
  • Supports streaming responses for real-time generation
  • Works on CPU or GPU (Apple Silicon recommended)
  • Returns token counts for cost estimation
Location: Ollama service (system-level)
Plain text datasets covering four hackathon tracks:
  • EcoHack: ecohack_boston_environment.txt (air quality, heat islands, climate resilience)
  • CityHack: cityhack_boston_311.txt (311 service requests, equity gaps)
  • EduHack: eduhack_boston_schools.txt (achievement gaps, absenteeism, tech access)
  • JusticeHack: justicehack_ma_justice.txt (incarceration disparities, policing data)
Location: data/ directory

Data flow

Here’s what happens when a user asks a question:
┌─────────────────────────────────────────────────────────┐
│ 1. User asks question (terminal or Gradio UI)          │
└───────────────────────┬─────────────────────────────────┘


┌─────────────────────────────────────────────────────────┐
│ 2. LlamaIndex converts question to vector              │
│    (using HuggingFace all-MiniLM-L6-v2 embeddings)     │
└───────────────────────┬─────────────────────────────────┘


┌─────────────────────────────────────────────────────────┐
│ 3. Vector index retrieves 3 most relevant chunks       │
│    from the civic dataset                               │
└───────────────────────┬─────────────────────────────────┘


┌─────────────────────────────────────────────────────────┐
│ 4. Retrieved context + question sent to Llama 3.1      │
│    via Ollama (localhost:11434)                        │
└───────────────────────┬─────────────────────────────────┘


┌─────────────────────────────────────────────────────────┐
│ 5. LLM generates grounded answer citing real data      │
│    (streams back token by token)                        │
└───────────────────────┬─────────────────────────────────┘


┌─────────────────────────────────────────────────────────┐
│ 6. Response displayed to user with cost comparison     │
└─────────────────────────────────────────────────────────┘
Every step happens locally on your machine. No data is sent to the cloud, and there are no API keys required.

Hardware profiles

The system adapts to different hardware configurations:
HardwareInference SpeedMemory UsageNotes
Apple Silicon M1/M2/M3/M4 base15-25 tok/s4-5 GBIdeal for demos
Apple Silicon Pro/Max20-35 tok/s4-5 GBExcellent performance
x86 laptop (CPU-only)3-8 tok/s4-5 GBWorks, but slower
x86 desktop (discrete GPU)15-40 tok/s4-5 GBFast generation
Apple Silicon Macs are ideal for this demo. The unified memory architecture handles Llama 3.1 8B beautifully.

System requirements

ComponentMinimumRecommended
RAM8 GB16 GB
GPU VRAMNot required8+ GB
Storage10 GB free20 GB free
CPU4-coreApple Silicon or recent Intel/AMD
Python3.10+3.12+
First runs are slower because models need to load into memory and the embedding model (~80 MB) downloads on first use. Always pre-warm all steps before a live demo.

Caching strategy

Index caching (Step 3 app)

The Gradio app caches built indices in a global dictionary:
# Global index cache
index_cache = {}

def build_index(track_name):
    if track_name in index_cache:
        return index_cache[track_name]
    
    # Build index...
    index_cache[track_name] = index
    return index
This makes switching tracks instant after the first load.

Embedding model caching

The HuggingFace embedding model downloads once to ~/.cache/huggingface/hub/ and is reused across all scripts.

Ollama model caching

Ollama keeps models in ~/.ollama/models/. Once downloaded, models are available offline.

Configuration options

Key settings are defined in scripts/demo_step2_rag.py and scripts/demo_step3_app.py:
# LLM configuration
Settings.llm = Ollama(
    model="llama3.1",
    request_timeout=120.0
)

# Embedding configuration
Settings.embed_model = HuggingFaceEmbedding(
    model_name="all-MiniLM-L6-v2"
)

# Query engine configuration
query_engine = index.as_query_engine(
    streaming=True,
    similarity_top_k=3  # Retrieve 3 most relevant chunks
)
You can swap models by changing the model parameter. Try phi3:mini for faster generation or deepseek-r1:7b for stronger reasoning.

Build docs developers (and LLMs) love