Skip to main content
Drako FinOps gives you full visibility into your AI agent spend and tools to reduce it — without changing how your agents work. Track costs per model and per agent, set monthly budgets, route simple tasks to cheaper models, and cache repeated queries to skip LLM calls entirely.
FinOps requires a Pro plan or above. The /finops page is gated on lower-tier plans.

What you can track

Cost by model

See exactly how much each LLM model costs you — GPT-4o, Claude Sonnet, GPT-4o-mini, and others. Identify if you’re over-relying on expensive models.

Cost by agent

Per-agent cost breakdown. Find your most expensive agents so you can optimize their tool usage or switch them to cheaper models.

Budget tracking

Monthly budget burn-down chart with actual spend (solid line), budget cap (dashed line), and projected end-of-month spend (dotted line).

Cache hit rate

Monitor how effectively semantic caching reduces your costs. Higher cache hit rate means fewer LLM calls and lower spend.

Smart model routing

Route tasks to the cheapest model that can handle them. Define routing rules based on task complexity or other conditions — simple summarization tasks go to gpt-4o-mini while complex reasoning stays on gpt-4o. Routing happens transparently before the LLM call. Your agents don’t need to know which model they’re using.

Semantic caching

Semantic caching skips LLM calls entirely when a sufficiently similar query has already been answered. Drako compares incoming queries against a vector cache using a configurable similarity threshold.
  • Queries above the similarity threshold are served from cache — no LLM call, no cost.
  • The default threshold is 0.92 (92% semantic similarity). Lower it to cache more aggressively; raise it for stricter matching.
  • Cache entries expire after a configurable TTL (default: 24 hours).

Configuration

Configure FinOps in the policies.finops section of .drako.yaml:
policies:
  finops:
    tracking:
      enabled: true
      model_costs:
        gpt-4o:
          input: 0.0025
          output: 0.01
        gpt-4o-mini:
          input: 0.00015
          output: 0.0006
    routing:
      enabled: true
      default_model: gpt-4o
      rules:
        - condition: "task_complexity == 'low'"
          model: gpt-4o-mini
          reason: "Use cheaper model for simple tasks"
    cache:
      enabled: true
      similarity_threshold: 0.92
      ttl_hours: 24
    budgets:
      daily_usd: 50.00
      weekly_usd: 250.00
      monthly_usd: 1000.00
      alert_at_percent: [50, 80, 95]

Configuration fields

FieldTypeDefaultDescription
enabledbooltrueEnable cost tracking
model_costs.<model>.inputfloatCost per 1K input tokens in USD
model_costs.<model>.outputfloatCost per 1K output tokens in USD
FieldTypeDefaultDescription
enabledboolfalseEnable smart model routing
default_modelstringModel to use when no rule matches
rules[].conditionstringExpression that triggers this rule
rules[].modelstringModel to use when the condition is true
rules[].reasonstringHuman-readable explanation (logged)
FieldTypeDefaultDescription
enabledboolfalseEnable semantic caching
similarity_thresholdfloat0.92Minimum cosine similarity to serve from cache
ttl_hoursint24How long to retain cache entries
FieldTypeDefaultDescription
daily_usdfloatDaily spend cap
weekly_usdfloatWeekly spend cap
monthly_usdfloatMonthly spend cap
alert_at_percentlist[int][80, 95]Send alerts when spend reaches these percentages of the budget

Budget alerts

Drako sends budget alerts when your spend reaches the configured thresholds. The default configuration alerts at 50%, 80%, and 95% of your monthly budget.
budgets:
  monthly_usd: 1000.00
  alert_at_percent: [50, 80, 95]
Budget alerts are delivered through the same channels as your other alert rules — Slack, email, or PagerDuty. Configure channels in the alerts section of your policy.
Budget tracking records costs per token based on the model_costs values you configure. Make sure these values reflect your actual contract pricing with each model provider.

FinOps scan rules

Drako’s static scanner checks for missing FinOps practices:
RuleSeverityWhat it checks
FIN-001MEDIUMNo cost tracking configured — finops.tracking.enabled is missing or false
FIN-002LOWSingle model used for all tasks — no routing rules defined, potential over-spend on expensive models
FIN-003LOWNo cache configured — every query hits the LLM, even repeated ones
Fix FIN-001 through FIN-003 by adding a policies.finops section to your .drako.yaml. Run drako init to generate a starter config with FinOps enabled.

Dashboard

The FinOps dashboard at /finops includes:
  • Summary cards — total monthly spend, top model by cost, and current cache hit rate
  • Cost by model — donut chart of spend distribution across LLM models
  • Cost by agent — bar chart ranking agents by cost
  • Budget tracking — burn-down chart with actual, budgeted, and projected spend
Data is sourced from GET /finops/summary, GET /finops/model-breakdown, GET /finops/agent-breakdown, and GET /finops/budget. Cost breakdowns update hourly.

Build docs developers (and LLMs) love