Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/arjunkshah/supercompress/llms.txt

Use this file to discover all available pages before exploring further.

SuperCompress is a Python library that compresses long agent context before every LLM call. Instead of blindly truncating from the head or tail, SuperCompress uses a lightweight ~5K-parameter eviction policy to retain the tokens most relevant to your current query — including answer-bearing lines in the middle of long documents that naive truncation drops entirely.

Quickstart

Get from install to your first compressed context in under five minutes.

How It Works

Understand the learned eviction pipeline and why it beats truncation.

Python API Reference

Full signatures for compress_context, compress_for_turn, and every public export.

Integrations

Wire SuperCompress into OpenAI messages, LangChain agents, or any HTTP client.

Why SuperCompress?

Long agent context is expensive. Every token in the KV cache costs GPU prefill time. Truncation keeps the head and tail but silently drops answers buried in the middle. SuperCompress learns which lines to keep for the current question — under a fixed token budget.
MetricSuperCompressTruncation / FIFO
KV savings @ 35% budget~65%~65%
Oracle recall100%~25%
Policy size~5K paramsrule-based
Runs onCPU (pre-inference)CPU
1

Install

pip install git+https://github.com/arjunkshah/supercompress.git
2

Compress your context

from supercompress import compress_context

result = compress_context(
    "long context text…",
    "What does fetch return when the row is missing?",
    budget_ratio=0.35,
)
print(result.compressed_text)
print(f"{result.kv_savings_pct:.1f}% KV saved · {result.kept_tokens}/{result.original_tokens} tokens")
3

Pass the result to your LLM

Use result.compressed_text wherever you’d pass your original context — it’s a plain string.

Explore the docs

Eviction Policies

FIFO, Truncation, H2O, Summarization, and the learned SuperCompress policy explained.

Benchmarks

Reproducible benchmark results across 8 seeds — oracle recall, entity recall, latency.

Environmental Impact

How tokens saved translates to GPU-seconds, Wh, and CO₂ with documented assumptions.

API Dashboard

Firebase auth, API key management, and per-key usage tracking for the hosted API.

Local Server

Run the FastAPI server locally for development and integration testing.

HTTP API

REST endpoints for the hosted compress service with API key authentication.

Build docs developers (and LLMs) love