SuperCompress is a Python library for learned context compression. Before each LLM inference call, it uses a small (~5K-parameter) CPU policy to decide which lines of your agent’s context are worth keeping under a fixed token budget — and which can be safely dropped. The result is a shorter prompt that fits in less KV cache, costs less to process, and still contains the information your model needs to answer correctly.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/arjunkshah/supercompress/llms.txt
Use this file to discover all available pages before exploring further.
The problem with naive truncation
The standard approach to fitting a long context into a token budget is to keep the head and tail and throw away whatever is in the middle. This is fast, but it has a fundamental flaw: the answer to the current question is often in the middle. A retrieval log, a previous tool response, or a computed value buried between padding lines will be silently removed, and the model will hallucinate or refuse rather than answer. SuperCompress solves this by treating eviction as a learned ranking problem. Given the current user query, it scores every line of context for relevance and retains the highest-scoring lines within the budget — so a criticalCRITICAL_ANSWER = "404 when row is missing" buried 180 lines deep survives compression while irrelevant filler is discarded.
Benchmark results
The table below compares SuperCompress against standard baselines at a 35% token budget (8 seeds):| Policy | Oracle recall | Entity recall | KV savings | Policy size |
|---|---|---|---|---|
| Truncation / FIFO | 25% | 73% | ~65% | rule-based |
| Summarization | 61% | 65% | ~65% | rule-based |
| H2O | 98% | 73% | ~65% | rule-based |
| SuperCompress | 100% | 73% | ~65% | ~5K params |
docs/ENVIRONMENT.md for the full methodology.
Public API
SuperCompress exposes seven public symbols fromsupercompress:
| Symbol | Kind | Purpose |
|---|---|---|
compress_context | function | Compress one text blob for a given question and budget |
compress_for_turn | function | Merge multiple context blocks, then compress before a turn |
compress_detailed | function | Compress with per-line LineAnnotation keep/drop reasoning |
compare_policies | function | Run FIFO, Truncation, Summarization, H2O, and SuperCompress side by side |
middle_truncation_failure_case | function | Build the canonical synthetic context that defeats head-and-tail truncation |
CompressResult | dataclass | Return type of every compress function — holds the trimmed text, token counts, and savings metrics |
LineAnnotation | dataclass | Per-line annotation returned by compress_detailed — holds kept, reason, and line_index |
supercompress (run compression from the shell) and supercompress-train (train or fine-tune the eviction policy). Both require Python 3.10 or newer.
Where to go next
Quickstart
Install SuperCompress and run your first compression in under five minutes.
How It Works
Understand the learned eviction policy, token budgets, and the sink-and-recent heuristic.
Eviction Policies
Compare FIFO, Truncation, Summarization, H2O, and the SuperCompress learned policy.
Integrations
Drop SuperCompress into OpenAI, LangChain, and LlamaIndex pipelines.
SuperCompress is released under the MIT License. You are free to use, modify, and distribute it in personal and commercial projects. See the
LICENSE file in the repository root for the full text.