FinOps requires a Pro plan or above. The
/finops page is gated on lower-tier plans.What you can track
Cost by model
See exactly how much each LLM model costs you — GPT-4o, Claude Sonnet, GPT-4o-mini, and others. Identify if you’re over-relying on expensive models.
Cost by agent
Per-agent cost breakdown. Find your most expensive agents so you can optimize their tool usage or switch them to cheaper models.
Budget tracking
Monthly budget burn-down chart with actual spend (solid line), budget cap (dashed line), and projected end-of-month spend (dotted line).
Cache hit rate
Monitor how effectively semantic caching reduces your costs. Higher cache hit rate means fewer LLM calls and lower spend.
Smart model routing
Route tasks to the cheapest model that can handle them. Define routing rules based on task complexity or other conditions — simple summarization tasks go togpt-4o-mini while complex reasoning stays on gpt-4o.
Routing happens transparently before the LLM call. Your agents don’t need to know which model they’re using.
Semantic caching
Semantic caching skips LLM calls entirely when a sufficiently similar query has already been answered. Drako compares incoming queries against a vector cache using a configurable similarity threshold.- Queries above the similarity threshold are served from cache — no LLM call, no cost.
- The default threshold is
0.92(92% semantic similarity). Lower it to cache more aggressively; raise it for stricter matching. - Cache entries expire after a configurable TTL (default: 24 hours).
Configuration
Configure FinOps in thepolicies.finops section of .drako.yaml:
Configuration fields
tracking
tracking
| Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable cost tracking |
model_costs.<model>.input | float | — | Cost per 1K input tokens in USD |
model_costs.<model>.output | float | — | Cost per 1K output tokens in USD |
routing
routing
| Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Enable smart model routing |
default_model | string | — | Model to use when no rule matches |
rules[].condition | string | — | Expression that triggers this rule |
rules[].model | string | — | Model to use when the condition is true |
rules[].reason | string | — | Human-readable explanation (logged) |
cache
cache
| Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Enable semantic caching |
similarity_threshold | float | 0.92 | Minimum cosine similarity to serve from cache |
ttl_hours | int | 24 | How long to retain cache entries |
budgets
budgets
| Field | Type | Default | Description |
|---|---|---|---|
daily_usd | float | — | Daily spend cap |
weekly_usd | float | — | Weekly spend cap |
monthly_usd | float | — | Monthly spend cap |
alert_at_percent | list[int] | [80, 95] | Send alerts when spend reaches these percentages of the budget |
Budget alerts
Drako sends budget alerts when your spend reaches the configured thresholds. The default configuration alerts at 50%, 80%, and 95% of your monthly budget.alerts section of your policy.
FinOps scan rules
Drako’s static scanner checks for missing FinOps practices:| Rule | Severity | What it checks |
|---|---|---|
| FIN-001 | MEDIUM | No cost tracking configured — finops.tracking.enabled is missing or false |
| FIN-002 | LOW | Single model used for all tasks — no routing rules defined, potential over-spend on expensive models |
| FIN-003 | LOW | No cache configured — every query hits the LLM, even repeated ones |
policies.finops section to your .drako.yaml. Run drako init to generate a starter config with FinOps enabled.
Dashboard
The FinOps dashboard at/finops includes:
- Summary cards — total monthly spend, top model by cost, and current cache hit rate
- Cost by model — donut chart of spend distribution across LLM models
- Cost by agent — bar chart ranking agents by cost
- Budget tracking — burn-down chart with actual, budgeted, and projected spend
GET /finops/summary, GET /finops/model-breakdown, GET /finops/agent-breakdown, and GET /finops/budget. Cost breakdowns update hourly.