Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/headroomlabs-ai/headroom/llms.txt

Use this file to discover all available pages before exploring further.

When agents hand off to each other, context is typically replayed in full — every research finding, every tool output, every intermediate result sent again to the next model. SharedContext solves this by compressing what moves between agents using Headroom’s full pipeline, typically saving ~80% of tokens on agent handoffs while keeping the original available on demand.

Quick Start

from headroom import SharedContext

ctx = SharedContext()

# Agent A stores large output
ctx.put("research", big_research_output, agent="researcher")

# Agent B gets compressed version (~80% smaller)
summary = ctx.get("research")

# Agent B needs full details on something specific
full = ctx.get("research", full=True)

API Reference

put(key, content, agent=None)

Store content under a key. Compresses automatically using Headroom’s full pipeline — SmartCrusher for JSON, CodeCompressor for code, Kompress for text.
entry = ctx.put("findings", big_json_output, agent="researcher")

entry.original_tokens     # 20,000
entry.compressed_tokens   # 4,000
entry.savings_percent     # 80.0
entry.transforms          # ["router:json:0.20"]
The optional agent argument records which agent stored the entry for provenance tracking.

get(key, full=False)

Retrieve content. Returns the compressed version by default; pass full=True for the original.
compressed = ctx.get("findings")              # 4K tokens
original   = ctx.get("findings", full=True)   # 20K tokens
missing    = ctx.get("nonexistent")           # None (not found or expired)

get_entry(key)

Get the full ContextEntry object with metadata:
entry = ctx.get_entry("findings")

entry.key                 # "findings"
entry.agent               # "researcher"
entry.original_tokens     # 20,000
entry.compressed_tokens   # 4,000
entry.savings_percent     # 80.0
entry.transforms          # ["router:json:0.20"]
entry.timestamp           # Unix timestamp when stored

stats()

Aggregated statistics across all live (non-expired) entries:
stats = ctx.stats()

stats.entries                  # 3
stats.total_original_tokens    # 60,000
stats.total_compressed_tokens  # 12,000
stats.total_tokens_saved       # 48,000
stats.savings_percent          # 80.0

keys() and clear()

ctx.keys()    # list of all non-expired keys
ctx.clear()   # remove all entries

Configuration

from headroom import SharedContext

ctx = SharedContext(
    model="claude-sonnet-4-5-20250929",  # For token counting
    ttl=3600,                             # Entry TTL in seconds (default: 1 hour)
    max_entries=100,                      # Evicts oldest when full (default: 100)
)
Entries expire after ttl seconds. When max_entries is reached, expired entries are evicted first; then the oldest live entry is removed.

Cross-Agent Memory vs. SharedContext

SharedContext

Ephemeral, in-process. Designed for a single workflow run — one orchestrator handing work between agents. Entries live in memory (with a TTL), so they vanish when the process exits. No persistence across sessions.

Cross-Agent Memory

Persistent, cross-session. Designed for long-lived facts (preferences, project conventions) that should be recalled in every future session, regardless of which agent runs. Uses Memory() and is stored durably on disk.
Use SharedContext when you need to pass large intermediate outputs between agents in a single workflow. Use persistent memory when you need to remember something across many future sessions.

Two-Agent Example

Here is a complete example of two agents sharing context via SharedContext:
from headroom import SharedContext

ctx = SharedContext()

# ── Agent A: Researcher ───────────────────────────────────────────────────────
def researcher_agent(topic: str) -> str:
    # Simulate a large research output
    big_output = fetch_and_compile_research(topic)

    # Store it — automatically compressed
    entry = ctx.put("research_findings", big_output, agent="researcher")
    print(f"Stored {entry.original_tokens} tokens → {entry.compressed_tokens} tokens "
          f"({entry.savings_percent:.1f}% saved)")

    # Return the compressed summary to the orchestrator
    return ctx.get("research_findings")


# ── Agent B: Coder ────────────────────────────────────────────────────────────
def coder_agent() -> str:
    # Get the compressed summary for context window efficiency
    summary = ctx.get("research_findings")

    # Fetch the full original only when generating the actual code
    full_findings = ctx.get("research_findings", full=True)

    return generate_code(summary, full_details=full_findings)


# ── Orchestrator ──────────────────────────────────────────────────────────────
summary = researcher_agent("distributed tracing in Python")
code    = coder_agent()

Framework Examples

from headroom import SharedContext

ctx = SharedContext()

# After researcher task completes
ctx.put("findings", researcher_task.output.raw)

# Coder task gets compressed context
coder_context = ctx.get("findings")

How It Works

Under the hood, put() calls headroom.compress() — the same pipeline used by the proxy and MCP server — and stores the original in memory. get() returns the compressed version; get(full=True) returns the original. The compression pipeline routes each entry to the best compressor automatically:
Content typeCompressorTypical savings
JSON arrays / objectsSmartCrusher70–95%
Code (Python, JS, Go…)CodeCompressor (AST-aware)40–70%
Prose / logs / textKompress-v2-base30–60%
The result is the same token budget as if you had summarized manually — with no extra LLM call.
SharedContext uses the same compression pipeline as the proxy but runs in-process with no network overhead. Originals are held in memory (bounded by max_entries and ttl) and are never written to disk.

Build docs developers (and LLMs) love