Every public type used by SuperCompress is exported directly from the top-levelDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/arjunkshah/supercompress/llms.txt
Use this file to discover all available pages before exploring further.
supercompress package. The types fall into four groups: the compression result dataclasses (CompressResult, LineAnnotation), the sustainability estimation types (SustainabilityEstimate, SustainabilityAssumptions), and the eviction policy abstract base class and its built-in implementations. All of the dataclasses below are importable from supercompress without any sub-module path.
CompressResult
CompressResult is the return type of compress_context, compress_for_turn, and compare_policies. It is a standard Python dataclass with two computed properties.
The full input context string before any eviction. Stored verbatim so you can diff it against
compressed_text if needed.The eviction output — the subset of lines and tokens that survived the budget cut, ready to be sent directly to your LLM.
Number of tokens in
original_text as counted by the internal tokeniser.Number of tokens retained in
compressed_text after eviction.The
budget_ratio value used for this compression call.The user query passed to the compression call, stored here for downstream metrics and logging.
Fraction of source lines present in
compressed_text. This is typically higher than kept_tokens / original_tokens because attention-sink and recent-context lines are always retained regardless of the budget.Human-readable name of the eviction policy that ran. Common values:
"SuperCompress", "H2O-fallback", "FIFO", "Truncation", "H2O", "Summarization", "noop" (empty input).Computed property. Percentage of KV-cache entries eliminated:
(1 − kept_tokens / max(original_tokens, 1)) × 100. Uses max(original_tokens, 1) as a guard against division by zero when original_tokens is zero.Computed property. Ratio of original to kept tokens:
original_tokens / kept_tokens. Returns 0.0 when kept_tokens is zero.LineAnnotation
LineAnnotation is returned by compress_detailed as one element per source line. It explains the keep/drop decision at line granularity.
Zero-based index of this line in the original
text.The raw content of the line as it appeared in the input (no trailing newline).
True if this line appears in the compressed output; False if it was evicted.One of five string literals explaining why this line was kept or dropped:
"attention sink (always kept)"— line index 0 or 1; always retained."recent context (always kept)"— one of the last 8 lines; always retained."question entity match"— line contains a named entity extracted fromquestion."learned retention score"— policy scored this line above the eviction threshold."evicted by policy"— line did not meet any retention criterion.
SustainabilityEstimate
SustainabilityEstimate is returned by sustainability_from_tokens_saved in supercompress.benchmarks.metrics. It translates a token savings figure into illustrative environmental impact numbers.
Number of tokens eliminated by compression (clamped to 0 if negative).
Estimated GPU-seconds avoided, derived from
tokens_saved × kv_share_of_prefill / tokens_per_gpu_second.Estimated watt-hours saved:
gpu_seconds_avoided × gpu_watts / 3600.Estimated kilograms of CO₂ avoided:
watt_hours_saved × grid_kg_co2_per_kwh / 1000.The
SustainabilityAssumptions dataclass used for this calculation (see below).SustainabilityAssumptions
SustainabilityAssumptions is a frozen dataclass that holds the constants used by sustainability_from_tokens_saved. All fields have documented defaults; override any of them by constructing a custom instance and passing it as the assumptions argument.
Assumed throughput of the GPU in tokens per second. Default:
2500.0.Assumed power draw of the GPU in watts. Default:
150.0.Carbon intensity of the electricity grid in kg CO₂ per kWh. Default:
0.417.Fraction of prefill compute attributed to KV-cache processing. Default:
0.55.SustainabilityAssumptions to sustainability_from_tokens_saved:
All sustainability figures are illustrative estimates based on the assumptions above — they are not measured values from your specific hardware or deployment environment. See the project’s
ENVIRONMENT.md for the full methodology.EvictionPolicy
EvictionPolicy is the abstract base class that all compression policies implement. It lives in supercompress.policies and defines a single abstract method.
select receives a list of TokenRecord objects (one per token in the input) and the integer token budget, and must return a list of token position indices to retain.
Built-in implementations
All of the following are importable fromsupercompress.policies:
| Class | name attribute | Description |
|---|---|---|
FIFO | "FIFO" | Drops the oldest tokens; keeps the most recent budget tokens. |
LRU | "LRU" | Keeps tokens with the highest recency score. |
SlidingWindow | "Sliding Window" | Fixed window on the recent half plus always-retained attention sinks (first 5 %). |
TruncationPolicy | "Truncation" | Head-and-tail: keeps attention sinks plus the most recent tokens. |
SummarizationPolicy | "Summarization" | Extractive: keeps lines with the highest entity overlap with the question. Accepts an optional question string at construction. |
H2OPolicy | "H2O" | Heavy Hitter Oracle: retains sinks, a recent window, and top cumulative-attention tokens. Accepts sink_tokens and recent_ratio at construction. |
LearnedPolicy | "Learned Policy" | Top-k by EvictionPolicyNetwork keep-score. Requires a pre-loaded model and optional device. |
AttentionHeuristicPolicy | "Attention Heuristic" | Non-learned baseline: keeps tokens with the highest attention mass. |
SnapKVPolicy | "SnapKV" | SnapKV-style: scores prefix tokens by attention from an observation window at the sequence end. |
OraclePolicy | "Oracle" | Upper-bound oracle: keeps all oracle-important tokens first, then fills remaining budget with the most recent. |