SuperCompress is a Python library that compresses long agent context before every LLM call. Instead of blindly truncating from the head or tail, SuperCompress uses a lightweight ~5K-parameter eviction policy to retain the tokens most relevant to your current query — including answer-bearing lines in the middle of long documents that naive truncation drops entirely.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/arjunkshah/supercompress/llms.txt
Use this file to discover all available pages before exploring further.
Quickstart
Get from install to your first compressed context in under five minutes.
How It Works
Understand the learned eviction pipeline and why it beats truncation.
Python API Reference
Full signatures for
compress_context, compress_for_turn, and every public export.Integrations
Wire SuperCompress into OpenAI messages, LangChain agents, or any HTTP client.
Why SuperCompress?
Long agent context is expensive. Every token in the KV cache costs GPU prefill time. Truncation keeps the head and tail but silently drops answers buried in the middle. SuperCompress learns which lines to keep for the current question — under a fixed token budget.| Metric | SuperCompress | Truncation / FIFO |
|---|---|---|
| KV savings @ 35% budget | ~65% | ~65% |
| Oracle recall | 100% | ~25% |
| Policy size | ~5K params | rule-based |
| Runs on | CPU (pre-inference) | CPU |
Explore the docs
Eviction Policies
FIFO, Truncation, H2O, Summarization, and the learned SuperCompress policy explained.
Benchmarks
Reproducible benchmark results across 8 seeds — oracle recall, entity recall, latency.
Environmental Impact
How tokens saved translates to GPU-seconds, Wh, and CO₂ with documented assumptions.
API Dashboard
Firebase auth, API key management, and per-key usage tracking for the hosted API.
Local Server
Run the FastAPI server locally for development and integration testing.
HTTP API
REST endpoints for the hosted compress service with API key authentication.