Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/holzerjm/civichacks-demo/llms.txt

Use this file to discover all available pages before exploring further.

The CivicHacks demo comes with four civic datasets, but you can easily swap them out with your own data. The RAG pipeline supports multiple file formats and works with any text-based content.

Supported file types

The demo supports these file formats out of the box:
FormatExtensionNotes
Plain text.txtSimplest option, used in the demo
PDF documents.pdfRequires llama-index-readers-file (already in requirements)
CSV spreadsheets.csvRead as text content
Word documents.docxRequires llama-index-readers-file
Web pagesN/AUse SimpleWebPageReader from LlamaIndex
The llama-index-readers-file package is already included in requirements.txt, so PDF and DOCX support is ready to use.

Quick start: Bring your own data

The fastest way to try your own data is with the BYOD (Bring Your Own Data) scripts:
1

Drop files into userdata/

Place your files in the userdata/ directory:
cp ~/Downloads/my-report.pdf userdata/
2

Run the BYOD script

The script auto-discovers files in userdata/:
# Auto-discover and select from available files
python scripts/demo_step4_byod.py

# Or specify a file directly
python scripts/demo_step4_byod.py path/to/your/file.txt

# Load ALL files for cross-document exploration
python scripts/demo_step4_byod.py --all
3

Use the web interface

For a browser-based experience with drag-and-drop:
python scripts/demo_step5_byod_app.py
Upload files directly in the browser at http://localhost:8861

Replace the demo datasets

To customize the track-specific data used in Steps 2 and 3, replace the files in the data/ directory:
1

Add your data files

Place your files in the data/ directory:
# Example: Replace the city track data
cp my-city-data.txt data/cityhack_boston_311.txt
Or add new files:
cp my-budget-report.pdf data/budget_track.pdf
2

Update track configuration in demo_step2_rag.py

Edit the TRACKS dictionary to reference your new files:
scripts/demo_step2_rag.py
TRACKS = {
    "eco": {
        "name": "🌿 EcoHack",
        "file": "ecohack_boston_environment.txt",
        "queries": [
            "Which neighborhoods have the worst air quality?",
            # ... more queries
        ],
    },
    "budget": {  # New track
        "name": "💰 Budget Analysis",
        "file": "budget_track.pdf",
        "queries": [
            "What are the biggest budget increases this year?",
            "Which departments face funding cuts?",
            "How does this budget address equity concerns?",
        ],
    },
}
3

Update track configuration in demo_step3_app.py

Update three dictionaries in the web app script:
scripts/demo_step3_app.py
TRACKS = {
    "🌿 EcoHack — Boston Environment": "ecohack_boston_environment.txt",
    "💰 Budget Analysis": "budget_track.pdf",  # Add your track
}

TRACK_DESCRIPTIONS = {
    "💰 Budget Analysis": "Analyze the city budget, funding priorities, and equity.",
}

EXAMPLE_QUESTIONS = {
    "💰 Budget Analysis": [
        "What are the biggest budget increases?",
        "Which departments face cuts?",
        "How is equity addressed in funding decisions?",
    ],
}

Load data from web pages

You can also load data directly from web pages using LlamaIndex’s SimpleWebPageReader:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader(input_files=[str(data_file)]).load_data()
When loading web pages, the reader extracts visible text content. For best results, use pages with clean, well-structured text rather than heavily dynamic JavaScript applications.

Multi-file exploration

The BYOD scripts support loading multiple files into a single index for cross-document analysis:
# Load ALL files in userdata/ directory
python scripts/demo_step4_byod.py --all
This builds one unified vector index, enabling questions like:
  • “Compare the findings across these reports”
  • “What themes are common across all documents?”
  • “Which report mentions climate change most?”
The auto-discovery logic works as follows:
Files in userdata/Behavior
0 filesPrompts for a file path
1 fileAutomatically uses that file
2+ filesShows a numbered list to pick from (or type a to load all)
--all flagLoads every file into a single combined index

File validation

The BYOD scripts validate files before processing:
  • File existence: Checks that the path is valid
  • Extension check: Ensures the file type is supported
  • Size limits: Validates reasonable file sizes
  • Graceful errors: Skips unreadable files when loading multiple documents
You’ll see detailed file analysis including:
────────────────────────────────────────────────────────────
  File Analysis
────────────────────────────────────────────────────────────

   File:      boston_budget_2026.pdf
   Type:      PDF document
   Size:      2.4 MB
   Modified:  February 18, 2026

   Content:   3 document(s), 45,230 characters, ~8,120 words

Next steps

Change the AI model

Try different models for speed or quality

Customize the UI

Modify the Gradio interface appearance

Build docs developers (and LLMs) love