Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/arjunkshah/supercompress/llms.txt

Use this file to discover all available pages before exploring further.

Every public type used by SuperCompress is exported directly from the top-level supercompress package. The types fall into four groups: the compression result dataclasses (CompressResult, LineAnnotation), the sustainability estimation types (SustainabilityEstimate, SustainabilityAssumptions), and the eviction policy abstract base class and its built-in implementations. All of the dataclasses below are importable from supercompress without any sub-module path.
from supercompress import CompressResult, LineAnnotation
from supercompress.policies import EvictionPolicy, FIFO, H2OPolicy
from supercompress.benchmarks.metrics import SustainabilityEstimate, SustainabilityAssumptions

CompressResult

CompressResult is the return type of compress_context, compress_for_turn, and compare_policies. It is a standard Python dataclass with two computed properties.
original_text
str
The full input context string before any eviction. Stored verbatim so you can diff it against compressed_text if needed.
compressed_text
str
The eviction output — the subset of lines and tokens that survived the budget cut, ready to be sent directly to your LLM.
original_tokens
int
Number of tokens in original_text as counted by the internal tokeniser.
kept_tokens
int
Number of tokens retained in compressed_text after eviction.
budget_ratio
float
The budget_ratio value used for this compression call.
question
str
The user query passed to the compression call, stored here for downstream metrics and logging.
kept_line_ratio
float
Fraction of source lines present in compressed_text. This is typically higher than kept_tokens / original_tokens because attention-sink and recent-context lines are always retained regardless of the budget.
policy_name
str
Human-readable name of the eviction policy that ran. Common values: "SuperCompress", "H2O-fallback", "FIFO", "Truncation", "H2O", "Summarization", "noop" (empty input).
kv_savings_pct
float
Computed property. Percentage of KV-cache entries eliminated: (1 − kept_tokens / max(original_tokens, 1)) × 100. Uses max(original_tokens, 1) as a guard against division by zero when original_tokens is zero.
compression_ratio
float
Computed property. Ratio of original to kept tokens: original_tokens / kept_tokens. Returns 0.0 when kept_tokens is zero.

LineAnnotation

LineAnnotation is returned by compress_detailed as one element per source line. It explains the keep/drop decision at line granularity.
line_index
int
Zero-based index of this line in the original text.
text
str
The raw content of the line as it appeared in the input (no trailing newline).
kept
bool
True if this line appears in the compressed output; False if it was evicted.
reason
str
One of five string literals explaining why this line was kept or dropped:
  • "attention sink (always kept)" — line index 0 or 1; always retained.
  • "recent context (always kept)" — one of the last 8 lines; always retained.
  • "question entity match" — line contains a named entity extracted from question.
  • "learned retention score" — policy scored this line above the eviction threshold.
  • "evicted by policy" — line did not meet any retention criterion.

SustainabilityEstimate

SustainabilityEstimate is returned by sustainability_from_tokens_saved in supercompress.benchmarks.metrics. It translates a token savings figure into illustrative environmental impact numbers.
from supercompress.benchmarks.metrics import sustainability_from_tokens_saved

saved = result.original_tokens - result.kept_tokens
impact = sustainability_from_tokens_saved(saved)
print(impact.to_dict())
tokens_saved
int
Number of tokens eliminated by compression (clamped to 0 if negative).
gpu_seconds_avoided
float
Estimated GPU-seconds avoided, derived from tokens_saved × kv_share_of_prefill / tokens_per_gpu_second.
watt_hours_saved
float
Estimated watt-hours saved: gpu_seconds_avoided × gpu_watts / 3600.
co2_kg_avoided
float
Estimated kilograms of CO₂ avoided: watt_hours_saved × grid_kg_co2_per_kwh / 1000.
assumptions
SustainabilityAssumptions
The SustainabilityAssumptions dataclass used for this calculation (see below).

SustainabilityAssumptions

SustainabilityAssumptions is a frozen dataclass that holds the constants used by sustainability_from_tokens_saved. All fields have documented defaults; override any of them by constructing a custom instance and passing it as the assumptions argument.
tokens_per_gpu_second
float
Assumed throughput of the GPU in tokens per second. Default: 2500.0.
gpu_watts
float
Assumed power draw of the GPU in watts. Default: 150.0.
grid_kg_co2_per_kwh
float
Carbon intensity of the electricity grid in kg CO₂ per kWh. Default: 0.417.
kv_share_of_prefill
float
Fraction of prefill compute attributed to KV-cache processing. Default: 0.55.
Override defaults by passing a custom SustainabilityAssumptions to sustainability_from_tokens_saved:
from supercompress.benchmarks.metrics import (
    SustainabilityAssumptions,
    sustainability_from_tokens_saved,
)

custom = SustainabilityAssumptions(
    tokens_per_gpu_second=5000.0,  # faster hardware
    gpu_watts=300.0,               # higher-power GPU
    grid_kg_co2_per_kwh=0.233,     # cleaner grid
    kv_share_of_prefill=0.55,
)
impact = sustainability_from_tokens_saved(tokens_saved=50_000, assumptions=custom)
All sustainability figures are illustrative estimates based on the assumptions above — they are not measured values from your specific hardware or deployment environment. See the project’s ENVIRONMENT.md for the full methodology.

EvictionPolicy

EvictionPolicy is the abstract base class that all compression policies implement. It lives in supercompress.policies and defines a single abstract method.
from supercompress.policies import EvictionPolicy

class EvictionPolicy(ABC):
    name: str = "base"

    @abstractmethod
    def select(self, records: List[TokenRecord], budget: int) -> List[int]:
        ...
select receives a list of TokenRecord objects (one per token in the input) and the integer token budget, and must return a list of token position indices to retain.

Built-in implementations

All of the following are importable from supercompress.policies:
Classname attributeDescription
FIFO"FIFO"Drops the oldest tokens; keeps the most recent budget tokens.
LRU"LRU"Keeps tokens with the highest recency score.
SlidingWindow"Sliding Window"Fixed window on the recent half plus always-retained attention sinks (first 5 %).
TruncationPolicy"Truncation"Head-and-tail: keeps attention sinks plus the most recent tokens.
SummarizationPolicy"Summarization"Extractive: keeps lines with the highest entity overlap with the question. Accepts an optional question string at construction.
H2OPolicy"H2O"Heavy Hitter Oracle: retains sinks, a recent window, and top cumulative-attention tokens. Accepts sink_tokens and recent_ratio at construction.
LearnedPolicy"Learned Policy"Top-k by EvictionPolicyNetwork keep-score. Requires a pre-loaded model and optional device.
AttentionHeuristicPolicy"Attention Heuristic"Non-learned baseline: keeps tokens with the highest attention mass.
SnapKVPolicy"SnapKV"SnapKV-style: scores prefix tokens by attention from an observation window at the sequence end.
OraclePolicy"Oracle"Upper-bound oracle: keeps all oracle-important tokens first, then fills remaining budget with the most recent.
from supercompress.policies import (
    FIFO,
    LRU,
    SlidingWindow,
    TruncationPolicy,
    SummarizationPolicy,
    H2OPolicy,
    LearnedPolicy,
    AttentionHeuristicPolicy,
    SnapKVPolicy,
    OraclePolicy,
)

# Example: compare FIFO and H2O side-by-side
from supercompress import compress_context

result_fifo = compress_context(text, question, policy=FIFO())
result_h2o  = compress_context(text, question, policy=H2OPolicy(sink_tokens=4, recent_ratio=0.2))

print(f"FIFO saved {result_fifo.kv_savings_pct:.1f}%")
print(f"H2O  saved {result_h2o.kv_savings_pct:.1f}%")

Build docs developers (and LLMs) love