Integrate Headroom with Agno AI Agents

Headroom integrates with Agno (formerly Phidata) by wrapping your Agno model in HeadroomAgnoModel. The wrapper is a full agno.models.base.Model subclass, so it is completely compatible with Agno’s Agent, tool loops, reasoning modes, and streaming — Headroom compresses messages before each API call inside the agent’s own invoke cycle, including after tool results are appended.

Installation

pip install "headroom-ai[agno]" agno

Quick start

Wrap any Agno model and pass it to your Agent:

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from headroom.integrations.agno import HeadroomAgnoModel

model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))
agent = Agent(model=model)

response = agent.run("What's the capital of France?")

print(f"Tokens saved: {model.total_tokens_saved}")
print(model.get_savings_summary())
# {'total_requests': 1, 'total_tokens_saved': 245, 'average_savings_percent': 12.3}

Works with any Agno provider:

from agno.models.anthropic import Claude
from agno.models.google import Gemini

claude_model = HeadroomAgnoModel(Claude(id="claude-sonnet-4-20250514"))
gemini_model = HeadroomAgnoModel(Gemini(id="gemini-2.0-flash"))

Full agent example with tools

Tool outputs (JSON, logs, search results) see the biggest compression gains — typically 70–90% reduction:

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools.duckduckgo import DuckDuckGoTools
from headroom.integrations.agno import HeadroomAgnoModel

model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))

agent = Agent(
    model=model,
    tools=[DuckDuckGoTools()],
    show_tool_calls=True,
)

response = agent.run("Research the latest AI developments")
print(f"Tokens saved: {model.total_tokens_saved}")

Observability hooks

Use pre-hooks and post-hooks for detailed tracking without modifying your model:

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from headroom.integrations.agno import (
    HeadroomAgnoModel,
    HeadroomPreHook,
    HeadroomPostHook,
)

model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))

pre_hook = HeadroomPreHook()
post_hook = HeadroomPostHook(token_alert_threshold=10000)

agent = Agent(
    model=model,
    pre_hooks=[pre_hook],
    post_hooks=[post_hook],
)

response = agent.run("Analyze this large dataset...")

# Check for alerts
if post_hook.alerts:
    print(f"{len(post_hook.alerts)} requests exceeded threshold")

Or use the convenience factory to create both hooks at once:

from headroom.integrations.agno import create_headroom_hooks

pre_hook, post_hook = create_headroom_hooks(
    token_alert_threshold=5000,
    log_level="DEBUG",
)

HeadroomPreHook tracks request counts and timing. Actual token optimization happens inside HeadroomAgnoModel at the model level, not the hook level — hooks are for observability.

Streaming support

HeadroomAgnoModel supports both sync and async streaming:

import asyncio
from agno.models.openai import OpenAIChat
from headroom.integrations.agno import HeadroomAgnoModel

async def process():
    model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))

    # Async single response
    response = await model.aresponse(messages)

    # Async streaming
    async for chunk in model.aresponse_stream(messages):
        print(chunk, end="", flush=True)

asyncio.run(process())

Standalone message optimization

Optimize messages directly without wrapping a model — useful when you need fine-grained control:

from headroom.integrations.agno import optimize_messages

optimized, metrics = optimize_messages(messages, model="gpt-4o")
print(f"Tokens saved: {metrics['tokens_saved']}")
print(f"Transforms applied: {metrics['transforms_applied']}")

Session management

HeadroomAgnoModel accumulates metrics across calls. Reset between sessions with model.reset():

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from headroom.integrations.agno import HeadroomAgnoModel

model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))
agent = Agent(model=model)

# Session 1
agent.run("First conversation...")
print(model.get_savings_summary())

# Reset for new session
model.reset()

# Session 2 starts fresh
agent.run("Second conversation...")

Agno reasoning modes

HeadroomAgnoModel exposes an underlying_model property for framework introspection — Agno uses this to detect reasoning capabilities:

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from headroom.integrations.agno import HeadroomAgnoModel

model = OpenAIChat(id="gpt-4o")
wrapped = HeadroomAgnoModel(wrapped_model=model)

agent = Agent(
    model=wrapped,
    reasoning=True,
    reasoning_model=wrapped.underlying_model,  # Use underlying for type detection
)

Claude’s extended thinking and Agno’s reasoning flow are incompatible. Choose one: set thinking={"type": "enabled", "budget_tokens": N} on the Claude model for native extended thinking, or use reasoning=True on the Agent for Agno’s framework-level chain-of-thought — not both. Headroom automatically skips compression for messages that contain extended thinking blocks.

Supported providers

Provider	Agno Model	Auto-Detected
OpenAI	`OpenAIChat`, `OpenAILike`	Yes
Anthropic	`Claude`, `AwsBedrock`	Yes
Google	`Gemini`, `VertexAI`	Yes
Groq	`Groq`	Yes
Mistral	`Mistral`	Yes
Ollama	`Ollama`	Yes

Get Started

Modes of Use

Core Concepts

Features

Integrations

Operations

Integrate Headroom with Agno AI Agents

Installation

Quick start

Full agent example with tools

Observability hooks

Streaming support

Standalone message optimization

Session management

Agno reasoning modes

Supported providers

Build docs developers (and LLMs) love

Get Started

Modes of Use

Core Concepts

Features

Integrations

Operations

Documentation Index

​Installation

​Quick start

​Full agent example with tools

​Observability hooks

​Streaming support

​Standalone message optimization

​Session management

​Agno reasoning modes

​Supported providers

Build docs developers (and LLMs) love

Installation

Quick start

Full agent example with tools

Observability hooks

Streaming support

Standalone message optimization

Session management

Agno reasoning modes

Supported providers