Swap in your own data

The CivicHacks demo comes with four civic datasets, but you can easily swap them out with your own data. The RAG pipeline supports multiple file formats and works with any text-based content.

Supported file types

The demo supports these file formats out of the box:

Format	Extension	Notes
Plain text	`.txt`	Simplest option, used in the demo
PDF documents	`.pdf`	Requires `llama-index-readers-file` (already in requirements)
CSV spreadsheets	`.csv`	Read as text content
Word documents	`.docx`	Requires `llama-index-readers-file`
Web pages	N/A	Use `SimpleWebPageReader` from LlamaIndex

The llama-index-readers-file package is already included in requirements.txt, so PDF and DOCX support is ready to use.

Quick start: Bring your own data

The fastest way to try your own data is with the BYOD (Bring Your Own Data) scripts:

Drop files into userdata/

Place your files in the userdata/ directory:

cp ~/Downloads/my-report.pdf userdata/

Run the BYOD script

The script auto-discovers files in userdata/:

# Auto-discover and select from available files
python scripts/demo_step4_byod.py

# Or specify a file directly
python scripts/demo_step4_byod.py path/to/your/file.txt

# Load ALL files for cross-document exploration
python scripts/demo_step4_byod.py --all

Use the web interface

For a browser-based experience with drag-and-drop:

python scripts/demo_step5_byod_app.py

Upload files directly in the browser at http://localhost:8861

Replace the demo datasets

To customize the track-specific data used in Steps 2 and 3, replace the files in the data/ directory:

Add your data files

Place your files in the data/ directory:

# Example: Replace the city track data
cp my-city-data.txt data/cityhack_boston_311.txt

Or add new files:

cp my-budget-report.pdf data/budget_track.pdf

Update track configuration in demo_step2_rag.py

Edit the TRACKS dictionary to reference your new files:

scripts/demo_step2_rag.py

TRACKS = {
    "eco": {
        "name": "🌿 EcoHack",
        "file": "ecohack_boston_environment.txt",
        "queries": [
            "Which neighborhoods have the worst air quality?",
            # ... more queries
        ],
    },
    "budget": {  # New track
        "name": "💰 Budget Analysis",
        "file": "budget_track.pdf",
        "queries": [
            "What are the biggest budget increases this year?",
            "Which departments face funding cuts?",
            "How does this budget address equity concerns?",
        ],
    },
}

Update track configuration in demo_step3_app.py

Update three dictionaries in the web app script:

scripts/demo_step3_app.py

TRACKS = {
    "🌿 EcoHack — Boston Environment": "ecohack_boston_environment.txt",
    "💰 Budget Analysis": "budget_track.pdf",  # Add your track
}

TRACK_DESCRIPTIONS = {
    "💰 Budget Analysis": "Analyze the city budget, funding priorities, and equity.",
}

EXAMPLE_QUESTIONS = {
    "💰 Budget Analysis": [
        "What are the biggest budget increases?",
        "Which departments face cuts?",
        "How is equity addressed in funding decisions?",
    ],
}

Load data from web pages

You can also load data directly from web pages using LlamaIndex’s SimpleWebPageReader:

from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader(input_files=[str(data_file)]).load_data()

When loading web pages, the reader extracts visible text content. For best results, use pages with clean, well-structured text rather than heavily dynamic JavaScript applications.

Multi-file exploration

The BYOD scripts support loading multiple files into a single index for cross-document analysis:

# Load ALL files in userdata/ directory
python scripts/demo_step4_byod.py --all

This builds one unified vector index, enabling questions like:

“Compare the findings across these reports”
“What themes are common across all documents?”
“Which report mentions climate change most?”

The auto-discovery logic works as follows:

Files in userdata/	Behavior
0 files	Prompts for a file path
1 file	Automatically uses that file
2+ files	Shows a numbered list to pick from (or type `a` to load all)
`--all` flag	Loads every file into a single combined index

File validation

The BYOD scripts validate files before processing:

File existence: Checks that the path is valid
Extension check: Ensures the file type is supported
Size limits: Validates reasonable file sizes
Graceful errors: Skips unreadable files when loading multiple documents

You’ll see detailed file analysis including:

────────────────────────────────────────────────────────────
  File Analysis
────────────────────────────────────────────────────────────

   File:      boston_budget_2026.pdf
   Type:      PDF document
   Size:      2.4 MB
   Modified:  February 18, 2026

   Content:   3 document(s), 45,230 characters, ~8,120 words

Getting Started

Tutorial Steps

Civic Data

Customization

Reference

Supported file types

Quick start: Bring your own data

Replace the demo datasets

Load data from web pages

Multi-file exploration

File validation

Next steps

Change the AI model

Customize the UI

Build docs developers (and LLMs) love

Getting Started

Tutorial Steps

Civic Data

Customization

Reference

Documentation Index

​Supported file types

​Quick start: Bring your own data

​Replace the demo datasets

​Load data from web pages

​Multi-file exploration

​File validation

​Next steps

Change the AI model

Customize the UI

Build docs developers (and LLMs) love

Supported file types

Quick start: Bring your own data

Replace the demo datasets

Load data from web pages

Multi-file exploration

File validation

Next steps