FinOps

Drako FinOps gives you full visibility into your AI agent spend and tools to reduce it — without changing how your agents work. Track costs per model and per agent, set monthly budgets, route simple tasks to cheaper models, and cache repeated queries to skip LLM calls entirely.

FinOps requires a Pro plan or above. The /finops page is gated on lower-tier plans.

What you can track

Cost by model

See exactly how much each LLM model costs you — GPT-4o, Claude Sonnet, GPT-4o-mini, and others. Identify if you’re over-relying on expensive models.

Cost by agent

Per-agent cost breakdown. Find your most expensive agents so you can optimize their tool usage or switch them to cheaper models.

Budget tracking

Monthly budget burn-down chart with actual spend (solid line), budget cap (dashed line), and projected end-of-month spend (dotted line).

Cache hit rate

Monitor how effectively semantic caching reduces your costs. Higher cache hit rate means fewer LLM calls and lower spend.

Smart model routing

Route tasks to the cheapest model that can handle them. Define routing rules based on task complexity or other conditions — simple summarization tasks go to gpt-4o-mini while complex reasoning stays on gpt-4o. Routing happens transparently before the LLM call. Your agents don’t need to know which model they’re using.

Semantic caching

Semantic caching skips LLM calls entirely when a sufficiently similar query has already been answered. Drako compares incoming queries against a vector cache using a configurable similarity threshold.

Queries above the similarity threshold are served from cache — no LLM call, no cost.
The default threshold is 0.92 (92% semantic similarity). Lower it to cache more aggressively; raise it for stricter matching.
Cache entries expire after a configurable TTL (default: 24 hours).

Configuration

Configure FinOps in the policies.finops section of .drako.yaml:

policies:
  finops:
    tracking:
      enabled: true
      model_costs:
        gpt-4o:
          input: 0.0025
          output: 0.01
        gpt-4o-mini:
          input: 0.00015
          output: 0.0006
    routing:
      enabled: true
      default_model: gpt-4o
      rules:
        - condition: "task_complexity == 'low'"
          model: gpt-4o-mini
          reason: "Use cheaper model for simple tasks"
    cache:
      enabled: true
      similarity_threshold: 0.92
      ttl_hours: 24
    budgets:
      daily_usd: 50.00
      weekly_usd: 250.00
      monthly_usd: 1000.00
      alert_at_percent: [50, 80, 95]

Configuration fields

tracking

Field	Type	Default	Description
`enabled`	`bool`	`true`	Enable cost tracking
`model_costs.<model>.input`	`float`	—	Cost per 1K input tokens in USD
`model_costs.<model>.output`	`float`	—	Cost per 1K output tokens in USD

routing

Field	Type	Default	Description
`enabled`	`bool`	`false`	Enable smart model routing
`default_model`	`string`	—	Model to use when no rule matches
`rules[].condition`	`string`	—	Expression that triggers this rule
`rules[].model`	`string`	—	Model to use when the condition is true
`rules[].reason`	`string`	—	Human-readable explanation (logged)

cache

Field	Type	Default	Description
`enabled`	`bool`	`false`	Enable semantic caching
`similarity_threshold`	`float`	`0.92`	Minimum cosine similarity to serve from cache
`ttl_hours`	`int`	`24`	How long to retain cache entries

budgets

Field	Type	Default	Description
`daily_usd`	`float`	—	Daily spend cap
`weekly_usd`	`float`	—	Weekly spend cap
`monthly_usd`	`float`	—	Monthly spend cap
`alert_at_percent`	`list[int]`	`[80, 95]`	Send alerts when spend reaches these percentages of the budget

Budget alerts

Drako sends budget alerts when your spend reaches the configured thresholds. The default configuration alerts at 50%, 80%, and 95% of your monthly budget.

budgets:
  monthly_usd: 1000.00
  alert_at_percent: [50, 80, 95]

Budget alerts are delivered through the same channels as your other alert rules — Slack, email, or PagerDuty. Configure channels in the alerts section of your policy.

Budget tracking records costs per token based on the model_costs values you configure. Make sure these values reflect your actual contract pricing with each model provider.

FinOps scan rules

Drako’s static scanner checks for missing FinOps practices:

Rule	Severity	What it checks
FIN-001	MEDIUM	No cost tracking configured — `finops.tracking.enabled` is missing or false
FIN-002	LOW	Single model used for all tasks — no routing rules defined, potential over-spend on expensive models
FIN-003	LOW	No cache configured — every query hits the LLM, even repeated ones

Fix FIN-001 through FIN-003 by adding a policies.finops section to your .drako.yaml. Run drako init to generate a starter config with FinOps enabled.

Dashboard

The FinOps dashboard at /finops includes:

Summary cards — total monthly spend, top model by cost, and current cache hit rate
Cost by model — donut chart of spend distribution across LLM models
Cost by agent — bar chart ranking agents by cost
Budget tracking — burn-down chart with actual, budgeted, and projected spend

Data is sourced from GET /finops/summary, GET /finops/model-breakdown, GET /finops/agent-breakdown, and GET /finops/budget. Cost breakdowns update hourly.

Get Started

Scanning

Runtime Enforcement

Configuration

Observability & Compliance

Integrations

What you can track

Cost by model

Cost by agent

Budget tracking

Cache hit rate

Smart model routing

Semantic caching

Configuration

Configuration fields

Budget alerts

FinOps scan rules

Dashboard

Build docs developers (and LLMs) love

Get Started

Scanning

Runtime Enforcement

Configuration

Observability & Compliance

Integrations

​What you can track

Cost by model

Cost by agent

Budget tracking

Cache hit rate

​Smart model routing

​Semantic caching

​Configuration

​Configuration fields

​Budget alerts

​FinOps scan rules

​Dashboard

Build docs developers (and LLMs) love

What you can track

Smart model routing

Semantic caching

Configuration

Configuration fields

Budget alerts

FinOps scan rules

Dashboard