Documentation Index
Fetch the complete documentation index at: https://mintlify.com/holzerjm/civichacks-demo/llms.txt
Use this file to discover all available pages before exploring further.
Overview
This step proves that you can run a GPT-4-class model locally, for free, with no API key. The script sends a civic-themed prompt to your local Ollama instance and streams the response token by token so you can watch the AI generate in real time. Duration: ~60 seconds What you’ll see:- Live AI generation streaming to your terminal
- Hostname and timestamp proving it’s running locally
- Elapsed time, tokens per second, and cost comparison
- Actual electricity cost vs. what the same query would cost on cloud APIs like GPT-4o
Prerequisites
Install Ollama
Download and install Ollama from https://ollama.com
Running the demo
Basic usage
Show help
Expected output
How it works
The script performs these steps:Display machine identity
Shows the live hostname (
platform.node()) and current timestamp to prove the AI is running locally on your machine.Calculate cost comparison
Extracts token counts from Ollama’s final streaming chunk and calls
format_cost_comparison() to show:- Local electricity cost (watts × seconds / 3600 = Wh, then Wh × $/kWh)
- Cloud API pricing for the same query (GPT-4o, Claude, etc.)
- Cost multiplier showing how much more expensive cloud APIs are
The civic prompt
The script uses this hardcoded prompt to make the demo relevant:Performance expectations
| Hardware | Expected speed |
|---|---|
| Apple Silicon (M1/M2/M3/M4) | 15-25 tokens/second |
| Recent Intel/AMD CPU | 3-8 tokens/second |
| Older CPU-only | 1-3 tokens/second |
First runs are slower because the model needs to load into memory. Run the script once before presenting to pre-warm everything.
Cost estimator module
All demo scripts sharescripts/cost_estimator.py, which provides:
detect_power_watts()— Auto-detects hardware wattage (Apple Silicon base/Pro/Max, x86 laptop, desktop GPU, etc.)estimate_local_cost(duration, watts)— Calculates actual electricity costestimate_cloud_cost(input_tokens, output_tokens)— Looks up published per-token pricing for GPT-4o, Claude 3.5 Sonnet, Gemini 2.5 Flash, and othersformat_cost_comparison()— Full one-line format for terminal outputformat_cost_short()— Compact format for Gradio chat metadata
Troubleshooting
Error: Ollama isn't responding
Error: Ollama isn't responding
Run
ollama serve to start the daemon, then verify with ollama list.Error: Model not found
Error: Model not found
Run
ollama pull llama3.1 to download the model.Error: Connection refused
Error: Connection refused
Ollama defaults to
localhost:11434. Ensure no firewall is blocking it.Response is very slow
Response is very slow
CPU-only machines run at 3-8 tokens/second. Close other applications to free RAM. Apple Silicon Macs are much faster (15-25 tokens/second).