compress_context() — Compress a Single Context String

compress_context is the core function of SuperCompress. Given a full context string and the current user query, it scores every token for relevance and evicts the lowest-scoring ones until the output fits within your budget_ratio. Use it whenever you have a single block of text — a conversation history, a retrieved document, or a combined system prompt — and want to reduce KV-cache pressure before calling your LLM.

Function signature

from supercompress import compress_context

result = compress_context(
    text: str,
    question: str,
    budget_ratio: float = 0.35,
    policy: Optional[EvictionPolicy] = None,
    checkpoint: Optional[str] = None,
) -> CompressResult

Parameters

text

str

required

The full context string to compress. This is the raw text you would otherwise pass directly to your LLM — conversation history, retrieved passages, tool outputs, or anything else that contributes to context length.

question

str

required

The current user query. SuperCompress uses this to score token relevance: tokens that overlap with named entities, keywords, and semantic patterns found in the question receive higher retention scores.

budget_ratio

float

default:"0.35"

Fraction of tokens to retain, expressed as a value in (0, 1]. For example, 0.35 keeps 35 % of tokens and evicts 65 %. Must be strictly greater than 0 and at most 1.

policy

EvictionPolicy

An explicit eviction policy object. When provided, this overrides both the learned checkpoint and the H2O fallback entirely. Pass any class that implements the EvictionPolicy ABC — for example FIFO(), TruncationPolicy(), or H2OPolicy(). When None, the policy is loaded from checkpoint.

checkpoint

str

Path to a trained weights file (.pt). Defaults to the bundled checkpoints/default.pt that ships with the package. Only used when policy is None.

Raises

ValueError is raised if budget_ratio is not in the range (0, 1]. Values of 0 or below, and values greater than 1, are rejected before any compression work begins.

Empty input behaviour

Passing an empty string — or a string that is only whitespace — never raises an error. compress_context detects this case immediately and returns a CompressResult with policy_name="noop", original_tokens=0, and kept_tokens=0. The compressed_text field will mirror the original (empty) input.

Returns

Returns a CompressResult dataclass.

original_text

str

The full input context string before any eviction — identical to the text argument passed in.

compressed_text

str

The evicted-and-rejoined output text, ready to be passed directly to your LLM.

original_tokens

int

Total number of tokens in text before compression.

kept_tokens

int

Number of tokens retained after eviction.

kv_savings_pct

float

Percentage of KV-cache entries eliminated: (1 − kept_tokens / max(original_tokens, 1)) × 100. Uses max(original_tokens, 1) to prevent division by zero on empty input.

compression_ratio

float

Ratio of original to kept tokens: original_tokens / kept_tokens. Returns 0.0 if no tokens were kept.

policy_name

str

The name of the policy that ran. Typical values: "SuperCompress", "H2O-fallback", "FIFO", "Truncation".

kept_line_ratio

float

Fraction of source lines retained, including attention-sink and recent-context lines that are always kept regardless of budget.

budget_ratio

float

The budget_ratio value that was used for this call.

question

str

The question string passed to this call (stored for reference and downstream metrics).

See Types for the full CompressResult field reference.

Examples

Basic usage

from supercompress import compress_context

result = compress_context(
    "long context text…",
    "What does fetch return when the row is missing?",
    budget_ratio=0.35,
)
print(result.compressed_text)   # send to your LLM
print(f"{result.kv_savings_pct:.1f}% KV saved")
print(f"{result.kept_tokens}/{result.original_tokens} tokens retained")

Explicit policy override

If you want deterministic, non-learned compression — for example during unit tests or when comparing baselines — pass a policy directly:

from supercompress import compress_context
from supercompress.policies import FIFO

result = compress_context(text, question, policy=FIFO())

When no checkpoint file is found and no policy is provided, compress_context automatically falls back to H2OPolicy (Heavy Hitter Oracle). The policy_name field on the returned result will read "H2O-fallback" so you can detect this at runtime.

Python API

HTTP API

compress_context() — Compress a Single Context String

Function signature

Parameters

Raises

Empty input behaviour

Returns

Examples

Basic usage

Explicit policy override

Build docs developers (and LLMs) love

Python API

HTTP API

Documentation Index

​Function signature

​Parameters

​Raises

​Empty input behaviour

​Returns

​Examples

​Basic usage

​Explicit policy override

Build docs developers (and LLMs) love

Function signature

Parameters

Raises

Empty input behaviour

Returns

Examples

Basic usage

Explicit policy override