SuperCompress Python Types — Dataclasses and Policies

Every public type used by SuperCompress is exported directly from the top-level supercompress package. The types fall into four groups: the compression result dataclasses (CompressResult, LineAnnotation), the sustainability estimation types (SustainabilityEstimate, SustainabilityAssumptions), and the eviction policy abstract base class and its built-in implementations. All of the dataclasses below are importable from supercompress without any sub-module path.

from supercompress import CompressResult, LineAnnotation
from supercompress.policies import EvictionPolicy, FIFO, H2OPolicy
from supercompress.benchmarks.metrics import SustainabilityEstimate, SustainabilityAssumptions

CompressResult

CompressResult is the return type of compress_context, compress_for_turn, and compare_policies. It is a standard Python dataclass with two computed properties.

original_text

str

The full input context string before any eviction. Stored verbatim so you can diff it against compressed_text if needed.

compressed_text

str

The eviction output — the subset of lines and tokens that survived the budget cut, ready to be sent directly to your LLM.

original_tokens

int

Number of tokens in original_text as counted by the internal tokeniser.

kept_tokens

int

Number of tokens retained in compressed_text after eviction.

budget_ratio

float

The budget_ratio value used for this compression call.

question

str

The user query passed to the compression call, stored here for downstream metrics and logging.

kept_line_ratio

float

Fraction of source lines present in compressed_text. This is typically higher than kept_tokens / original_tokens because attention-sink and recent-context lines are always retained regardless of the budget.

policy_name

str

Human-readable name of the eviction policy that ran. Common values: "SuperCompress", "H2O-fallback", "FIFO", "Truncation", "H2O", "Summarization", "noop" (empty input).

kv_savings_pct

float

Computed property. Percentage of KV-cache entries eliminated: (1 − kept_tokens / max(original_tokens, 1)) × 100. Uses max(original_tokens, 1) as a guard against division by zero when original_tokens is zero.

compression_ratio

float

Computed property. Ratio of original to kept tokens: original_tokens / kept_tokens. Returns 0.0 when kept_tokens is zero.

LineAnnotation

LineAnnotation is returned by compress_detailed as one element per source line. It explains the keep/drop decision at line granularity.

line_index

int

Zero-based index of this line in the original text.

text

str

The raw content of the line as it appeared in the input (no trailing newline).

kept

bool

True if this line appears in the compressed output; False if it was evicted.

reason

str

One of five string literals explaining why this line was kept or dropped:

"attention sink (always kept)" — line index 0 or 1; always retained.
"recent context (always kept)" — one of the last 8 lines; always retained.
"question entity match" — line contains a named entity extracted from question.
"learned retention score" — policy scored this line above the eviction threshold.
"evicted by policy" — line did not meet any retention criterion.

SustainabilityEstimate

SustainabilityEstimate is returned by sustainability_from_tokens_saved in supercompress.benchmarks.metrics. It translates a token savings figure into illustrative environmental impact numbers.

from supercompress.benchmarks.metrics import sustainability_from_tokens_saved

saved = result.original_tokens - result.kept_tokens
impact = sustainability_from_tokens_saved(saved)
print(impact.to_dict())

tokens_saved

int

Number of tokens eliminated by compression (clamped to 0 if negative).

gpu_seconds_avoided

float

Estimated GPU-seconds avoided, derived from tokens_saved × kv_share_of_prefill / tokens_per_gpu_second.

watt_hours_saved

float

Estimated watt-hours saved: gpu_seconds_avoided × gpu_watts / 3600.

co2_kg_avoided

float

Estimated kilograms of CO₂ avoided: watt_hours_saved × grid_kg_co2_per_kwh / 1000.

assumptions

SustainabilityAssumptions

The SustainabilityAssumptions dataclass used for this calculation (see below).

SustainabilityAssumptions

SustainabilityAssumptions is a frozen dataclass that holds the constants used by sustainability_from_tokens_saved. All fields have documented defaults; override any of them by constructing a custom instance and passing it as the assumptions argument.

tokens_per_gpu_second

float

Assumed throughput of the GPU in tokens per second. Default: 2500.0.

gpu_watts

float

Assumed power draw of the GPU in watts. Default: 150.0.

grid_kg_co2_per_kwh

float

Carbon intensity of the electricity grid in kg CO₂ per kWh. Default: 0.417.

Fraction of prefill compute attributed to KV-cache processing. Default: 0.55.

Override defaults by passing a custom SustainabilityAssumptions to sustainability_from_tokens_saved:

from supercompress.benchmarks.metrics import (
    SustainabilityAssumptions,
    sustainability_from_tokens_saved,
)

custom = SustainabilityAssumptions(
    tokens_per_gpu_second=5000.0,  # faster hardware
    gpu_watts=300.0,               # higher-power GPU
    grid_kg_co2_per_kwh=0.233,     # cleaner grid
    kv_share_of_prefill=0.55,
)
impact = sustainability_from_tokens_saved(tokens_saved=50_000, assumptions=custom)

All sustainability figures are illustrative estimates based on the assumptions above — they are not measured values from your specific hardware or deployment environment. See the project’s ENVIRONMENT.md for the full methodology.

EvictionPolicy

EvictionPolicy is the abstract base class that all compression policies implement. It lives in supercompress.policies and defines a single abstract method.

from supercompress.policies import EvictionPolicy

class EvictionPolicy(ABC):
    name: str = "base"

    @abstractmethod
    def select(self, records: List[TokenRecord], budget: int) -> List[int]:
        ...

select receives a list of TokenRecord objects (one per token in the input) and the integer token budget, and must return a list of token position indices to retain.

Built-in implementations

All of the following are importable from supercompress.policies:

Class	`name` attribute	Description
`FIFO`	`"FIFO"`	Drops the oldest tokens; keeps the most recent `budget` tokens.
`LRU`	`"LRU"`	Keeps tokens with the highest recency score.
`SlidingWindow`	`"Sliding Window"`	Fixed window on the recent half plus always-retained attention sinks (first 5 %).
`TruncationPolicy`	`"Truncation"`	Head-and-tail: keeps attention sinks plus the most recent tokens.
`SummarizationPolicy`	`"Summarization"`	Extractive: keeps lines with the highest entity overlap with the question. Accepts an optional `question` string at construction.
`H2OPolicy`	`"H2O"`	Heavy Hitter Oracle: retains sinks, a recent window, and top cumulative-attention tokens. Accepts `sink_tokens` and `recent_ratio` at construction.
`LearnedPolicy`	`"Learned Policy"`	Top-k by `EvictionPolicyNetwork` keep-score. Requires a pre-loaded `model` and optional `device`.
`AttentionHeuristicPolicy`	`"Attention Heuristic"`	Non-learned baseline: keeps tokens with the highest attention mass.
`SnapKVPolicy`	`"SnapKV"`	SnapKV-style: scores prefix tokens by attention from an observation window at the sequence end.
`OraclePolicy`	`"Oracle"`	Upper-bound oracle: keeps all oracle-important tokens first, then fills remaining budget with the most recent.

from supercompress.policies import (
    FIFO,
    LRU,
    SlidingWindow,
    TruncationPolicy,
    SummarizationPolicy,
    H2OPolicy,
    LearnedPolicy,
    AttentionHeuristicPolicy,
    SnapKVPolicy,
    OraclePolicy,
)

# Example: compare FIFO and H2O side-by-side
from supercompress import compress_context

result_fifo = compress_context(text, question, policy=FIFO())
result_h2o  = compress_context(text, question, policy=H2OPolicy(sink_tokens=4, recent_ratio=0.2))

print(f"FIFO saved {result_fifo.kv_savings_pct:.1f}%")
print(f"H2O  saved {result_h2o.kv_savings_pct:.1f}%")

Python API

HTTP API

SuperCompress Python Types — Dataclasses and Policies

CompressResult

LineAnnotation

SustainabilityEstimate

SustainabilityAssumptions

EvictionPolicy

Built-in implementations

Build docs developers (and LLMs) love

Python API

HTTP API

Documentation Index

​CompressResult

​LineAnnotation

​SustainabilityEstimate

​SustainabilityAssumptions

​EvictionPolicy

​Built-in implementations

Build docs developers (and LLMs) love

CompressResult

LineAnnotation

SustainabilityEstimate

SustainabilityAssumptions

EvictionPolicy

Built-in implementations