Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/holzerjm/civichacks-demo/llms.txt

Use this file to discover all available pages before exploring further.

The demo uses Llama 3.1 8B by default, but you can easily switch to different models based on your hardware constraints or quality requirements.

Available model options

Ollama provides a wide range of open source models. Here are recommended options:

Smaller/faster models

Ideal for limited hardware or faster inference:
ModelSizeSpeedUse case
llama3.2:3b3BVery fastQuick responses, limited hardware
phi3:mini3.8BFastGeneral use on any laptop
gemma2:2b2BVery fastMaximum speed, basic tasks

Balanced models

Good quality with reasonable resource requirements:
ModelSizeSpeedUse case
llama3.18BMediumDefault - best balance (demo uses this)
llama3.28BMediumLatest version with improvements
mistral7BMediumStrong general-purpose model

Larger/better models

Require more RAM but provide higher quality:
ModelSizeRAM neededUse case
llama3.1:70b70B~40GBBest quality, needs powerful hardware
mixtral47B~32GBStrong reasoning, mixture of experts

Specialized models

Optimized for specific tasks:
ModelSizeSpecialization
deepseek-r1:7b7BStrong reasoning tasks (MIT license)
codellama7B-34BCode generation and analysis
Apple Silicon Macs (M1/M2/M3/M4) handle 8B models beautifully at 15-25 tokens/second. CPU-only machines work fine at ~3-5 tokens/second.

Pull a new model

Download models using Ollama before using them:
1

Pull the model

Download your chosen model:
# Smaller/faster
ollama pull llama3.2:3b
ollama pull phi3:mini

# Larger/better (requires more RAM)
ollama pull llama3.1:70b

# Best for reasoning
ollama pull deepseek-r1:7b
Model downloads can be large (3-40GB). Use reliable wifi and expect several minutes for download.
2

Verify the model

Test that the model works:
ollama run llama3.2:3b "Say hello in 10 words or less"

Update the scripts

Once you’ve pulled a new model, update the scripts to use it:

Step 1: Basic local AI (demo_step1_ollama.py)

stream = ollama.chat(
    model="llama3.1",
    messages=[{"role": "user", "content": PROMPT}],
    stream=True,
)
The model name appears around line 268 in scripts/demo_step1_ollama.py.

Step 2: RAG with civic data (demo_step2_rag.py)

Settings.llm = Ollama(model="llama3.1", request_timeout=120.0)
Settings.embed_model = HuggingFaceEmbedding(model_name="all-MiniLM-L6-v2")
Update line 170 in scripts/demo_step2_rag.py.
The embedding model (all-MiniLM-L6-v2) should generally stay the same - it’s used for search, not generation.

Step 3: Web application (demo_step3_app.py)

Settings.llm = Ollama(model="llama3.1", request_timeout=120.0)
Settings.embed_model = HuggingFaceEmbedding(model_name="all-MiniLM-L6-v2")
Update line 203-204 in scripts/demo_step3_app.py.

Step 4 & 5: BYOD scripts

The BYOD scripts support a --model flag, so no code changes needed:
# Use a different model via command line
python scripts/demo_step4_byod.py myfile.txt --model phi3:mini
python scripts/demo_step5_byod_app.py --model deepseek-r1:7b

Performance considerations

Hardware requirements

Component3B model8B model70B model
RAM4GB+8GB+40GB+
GPU VRAMOptionalOptionalRecommended 24GB+
Storage2GB5GB40GB
Speed (CPU)8-15 tok/s3-8 tok/s<1 tok/s
Speed (GPU)30-60 tok/s15-25 tok/s5-10 tok/s
If you’re doing a live demo and the model is too slow, you can:
  1. Pre-warm the model by running it once before presenting
  2. Use a smaller model for the demo
  3. Tell the audience the slowness demonstrates real-time local inference (turn it into a feature!)

Timeout settings

Larger models may need longer timeouts:
# Increase timeout for slower models
Settings.llm = Ollama(model="llama3.1:70b", request_timeout=300.0)  # 5 minutes

Model selection guide

Choose based on your priorities:
1

Optimize for speed

If you need fast responses or have limited hardware:
ollama pull llama3.2:3b
Update scripts to use model="llama3.2:3b"
2

Optimize for quality

If you have powerful hardware and want best results:
ollama pull llama3.1:70b
Update scripts to use model="llama3.1:70b" and increase request_timeout=300.0
3

Optimize for reasoning

For complex analysis tasks:
ollama pull deepseek-r1:7b
Update scripts to use model="deepseek-r1:7b"

Check available models

See what models you have downloaded:
# List all downloaded models
ollama list

# Output example:
NAME              ID            SIZE    MODIFIED
llama3.1:latest   42182419e950  4.7 GB  2 hours ago
phi3:mini         64c1188f2485  2.4 GB  1 day ago
Remove unused models to free space:
ollama rm llama3.1:70b

Next steps

Swap in your own data

Replace the demo datasets with your files

Customize the UI

Modify the Gradio interface appearance

Build docs developers (and LLMs) love