compress_for_turn() — Merge and Compress Context Blocks

compress_for_turn is designed for the common agent pattern where a single LLM turn is assembled from several independent context sources: a system prompt, one or more tool outputs, retrieved documents, and an ongoing chat history. Rather than requiring you to concatenate those blocks yourself, compress_for_turn accepts them as a list and handles the merge-then-compress pipeline in one call. Use this function whenever your context is naturally partitioned into labelled sections that you would otherwise manually join before sending to compress_context.

Function signature

from supercompress import compress_for_turn

compressed_text, result = compress_for_turn(
    context_blocks: List[str],
    user_query: str,
    budget_ratio: float = 0.35,
) -> tuple[str, CompressResult]

Parameters

context_blocks

List[str]

required

An ordered list of context strings. Each element can be any length. Empty strings and whitespace-only strings are silently skipped before merging, so it is safe to include optional blocks that may be empty at runtime.

user_query

str

required

The current user message for this turn. Passed directly to the underlying compress_context call to drive token relevance scoring — tokens semantically related to this query are more likely to be retained.

budget_ratio

float

default:"0.35"

Token retention fraction in (0, 1]. Forwarded unchanged to compress_context. A value of 0.35 retains 35 % of the merged token count.

How blocks are merged

After filtering out empty strings, the remaining blocks are joined with the separator "\n\n---\n\n". The resulting merged string is then passed to compress_context with the same user_query and budget_ratio. The --- separator lines act as clear boundaries between sections so that per-line scoring does not bleed across blocks.

Returns

Returns a two-element tuple.

compressed_text

str

The compressed output string — the merged, evicted context ready to be used directly as your LLM prompt. This is identical to result.compressed_text and is surfaced at the top level for convenience.

result

CompressResult

The full CompressResult from the underlying compress_context call, including token counts, savings percentages, and the policy name that ran. result.original_text will contain the merged (pre-compression) string.

Example

from supercompress import compress_for_turn

compressed, stats = compress_for_turn(
    context_blocks=[
        "## System Notes\n…",
        "## Tool Output\n…",
        "## Chat History\n…",
    ],
    user_query="Summarize the API",
    budget_ratio=0.35,
)
# Use compressed directly as your LLM prompt
print(f"Saved {stats.kv_savings_pct:.1f}% KV cache")

Because compress_for_turn calls compress_context internally, it inherits the same checkpoint-loading and H2O-fallback behaviour. If you need a specific eviction policy, call compress_context directly with the merged string and your chosen policy= argument.

Python API

HTTP API

compress_for_turn() — Merge and Compress Context Blocks

Function signature

Parameters

How blocks are merged

Returns

Example

Build docs developers (and LLMs) love

Python API

HTTP API

Documentation Index

​Function signature

​Parameters

​How blocks are merged

​Returns

​Example

Build docs developers (and LLMs) love

Function signature

Parameters

How blocks are merged

Returns

Example