Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/arjunkshah/supercompress/llms.txt

Use this file to discover all available pages before exploring further.

SuperCompress is a Python library for learned context compression. Before each LLM inference call, it uses a small (~5K-parameter) CPU policy to decide which lines of your agent’s context are worth keeping under a fixed token budget — and which can be safely dropped. The result is a shorter prompt that fits in less KV cache, costs less to process, and still contains the information your model needs to answer correctly.

The problem with naive truncation

The standard approach to fitting a long context into a token budget is to keep the head and tail and throw away whatever is in the middle. This is fast, but it has a fundamental flaw: the answer to the current question is often in the middle. A retrieval log, a previous tool response, or a computed value buried between padding lines will be silently removed, and the model will hallucinate or refuse rather than answer. SuperCompress solves this by treating eviction as a learned ranking problem. Given the current user query, it scores every line of context for relevance and retains the highest-scoring lines within the budget — so a critical CRITICAL_ANSWER = "404 when row is missing" buried 180 lines deep survives compression while irrelevant filler is discarded.

Benchmark results

The table below compares SuperCompress against standard baselines at a 35% token budget (8 seeds):
PolicyOracle recallEntity recallKV savingsPolicy size
Truncation / FIFO25%73%~65%rule-based
Summarization61%65%~65%rule-based
H2O98%73%~65%rule-based
SuperCompress100%73%~65%~5K params
SuperCompress achieves 100% oracle recall — meaning it never loses the answer line — while matching the KV savings of every other approach. The policy runs entirely on CPU before inference, adding roughly 60 ms of overhead. At 1 million compressions (estimated): ~800M tokens avoided · 29 kWh · 12 kg CO₂ saved. See docs/ENVIRONMENT.md for the full methodology.

Public API

SuperCompress exposes seven public symbols from supercompress:
SymbolKindPurpose
compress_contextfunctionCompress one text blob for a given question and budget
compress_for_turnfunctionMerge multiple context blocks, then compress before a turn
compress_detailedfunctionCompress with per-line LineAnnotation keep/drop reasoning
compare_policiesfunctionRun FIFO, Truncation, Summarization, H2O, and SuperCompress side by side
middle_truncation_failure_casefunctionBuild the canonical synthetic context that defeats head-and-tail truncation
CompressResultdataclassReturn type of every compress function — holds the trimmed text, token counts, and savings metrics
LineAnnotationdataclassPer-line annotation returned by compress_detailed — holds kept, reason, and line_index
Installing the package also registers two CLI entry points: supercompress (run compression from the shell) and supercompress-train (train or fine-tune the eviction policy). Both require Python 3.10 or newer.

Where to go next

Quickstart

Install SuperCompress and run your first compression in under five minutes.

How It Works

Understand the learned eviction policy, token budgets, and the sink-and-recent heuristic.

Eviction Policies

Compare FIFO, Truncation, Summarization, H2O, and the SuperCompress learned policy.

Integrations

Drop SuperCompress into OpenAI, LangChain, and LlamaIndex pipelines.
SuperCompress is released under the MIT License. You are free to use, modify, and distribute it in personal and commercial projects. See the LICENSE file in the repository root for the full text.

Build docs developers (and LLMs) love