SuperCompress is designed to slot into any Python LLM workflow with a single function call. This guide walks you from a fresh install to a working compression pipeline, and then shows you exactly why naive truncation fails on real agent context — and how SuperCompress handles it.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/arjunkshah/supercompress/llms.txt
Use this file to discover all available pages before exploring further.
Install SuperCompress
Install directly from GitHub using pip. No PyPI release is required.This pulls in the core library along with its two dependencies (
torch>=2.0.0 and numpy>=1.24.0) and includes the pretrained checkpoint at checkpoints/default.pt. See the Installation guide for optional extras like the dev tools and local HTTP server.Compress your context
Import SuperCompress loads the bundled checkpoint automatically. If the checkpoint is not found it falls back to the
compress_context and pass it your context string, the current user query, and a budget_ratio representing the fraction of tokens to keep.H2OPolicy baseline so the call never raises unexpectedly.Read the result
compress_context returns a CompressResult dataclass. The fields you’ll use most often are:CompressResult also exposes compression_ratio, kept_line_ratio, policy_name, budget_ratio, and the original question used for scoring.See why truncation fails
Themiddle_truncation_failure_case() helper builds a synthetic context where a critical answer is buried 180 lines deep — exactly where head-and-tail truncation will always drop it. Running compare_policies() on it shows the difference at a glance.
compare_policies() runs all five policies — FIFO, Truncation, Summarization, H2O, and SuperCompress — over the same context and returns a dict[str, CompressResult] so you can benchmark them side by side on your own data too.
Other exported functions
SuperCompress exports two additional functions for common agent patterns.compress_for_turn accepts a list of context blocks (tool responses, memory chunks, prior turns) and merges them before compressing. Use it when your agent builds context from multiple sources:
compress_detailed returns the same CompressResult plus a list of LineAnnotation objects — one per input line — each carrying line_index, text, kept (bool), and reason (e.g. "learned retention score", "attention sink (always kept)"). Use it when you need to inspect or visualize exactly which lines were evicted and why: