Documentation Index
Fetch the complete documentation index at: https://mintlify.com/arjunkshah/supercompress/llms.txt
Use this file to discover all available pages before exploring further.
compress_context is the core function of SuperCompress. Given a full context string and the current user query, it scores every token for relevance and evicts the lowest-scoring ones until the output fits within your budget_ratio. Use it whenever you have a single block of text — a conversation history, a retrieved document, or a combined system prompt — and want to reduce KV-cache pressure before calling your LLM.
Function signature
Parameters
The full context string to compress. This is the raw text you would otherwise pass directly to your LLM — conversation history, retrieved passages, tool outputs, or anything else that contributes to context length.
The current user query. SuperCompress uses this to score token relevance: tokens that overlap with named entities, keywords, and semantic patterns found in the question receive higher retention scores.
Fraction of tokens to retain, expressed as a value in
(0, 1]. For example, 0.35 keeps 35 % of tokens and evicts 65 %. Must be strictly greater than 0 and at most 1.An explicit eviction policy object. When provided, this overrides both the learned checkpoint and the H2O fallback entirely. Pass any class that implements the
EvictionPolicy ABC — for example FIFO(), TruncationPolicy(), or H2OPolicy(). When None, the policy is loaded from checkpoint.Path to a trained weights file (
.pt). Defaults to the bundled checkpoints/default.pt that ships with the package. Only used when policy is None.Raises
Empty input behaviour
Passing an empty string — or a string that is only whitespace — never raises an error.compress_context detects this case immediately and returns a CompressResult with policy_name="noop", original_tokens=0, and kept_tokens=0. The compressed_text field will mirror the original (empty) input.
Returns
Returns aCompressResult dataclass.
The full input context string before any eviction — identical to the
text argument passed in.The evicted-and-rejoined output text, ready to be passed directly to your LLM.
Total number of tokens in
text before compression.Number of tokens retained after eviction.
Percentage of KV-cache entries eliminated:
(1 − kept_tokens / max(original_tokens, 1)) × 100. Uses max(original_tokens, 1) to prevent division by zero on empty input.Ratio of original to kept tokens:
original_tokens / kept_tokens. Returns 0.0 if no tokens were kept.The name of the policy that ran. Typical values:
"SuperCompress", "H2O-fallback", "FIFO", "Truncation".Fraction of source lines retained, including attention-sink and recent-context lines that are always kept regardless of budget.
The
budget_ratio value that was used for this call.The
question string passed to this call (stored for reference and downstream metrics).CompressResult field reference.