The Problem: Context Window Limits
LLMs have finite context windows (e.g., 128K tokens for GPT-4). Long-running agents face challenges:- Token accumulation: Each tool call adds messages to history
- Context overflow: Eventually exceeds the model’s limit
- Information loss: Simple truncation loses important context
- Cost increase: More tokens = higher API costs
The Solution: Memory-Based Compression
Koog’s history compression uses the Memory feature to extract and preserve facts before compressing history:- Extract facts: Use LLM to identify key information from conversation
- Store as concepts: Save facts to memory as structured concepts
- Compress history: Replace verbose history with fact summaries
- Inject context: Load facts back into prompt when needed
Installation
History compression is implemented through the Memory feature’s compression strategy:How It Works
1. Facts Extraction
When compression is triggered, the LLM extracts facts:2. History Compression
Verbose history is replaced with a fact summary:3. Context Preservation
Critical information is preserved:- ✅ Extracted facts (structured knowledge)
- ✅ Last tool call and result (immediate context)
- ✅ Memory-tagged messages (marked as important)
- ❌ Verbose intermediate steps (compressed away)
Configuration
Define Concepts
Choose what to preserve based on your use case:Compression Strategy
Triggering Compression
Compression is typically triggered manually when needed:Automatic Compression (Advanced)
Implement automatic compression based on token count:Complete Example
Benefits
Extended conversations
Extended conversations
Handle much longer interactions without hitting context limits.
Preserved context
Preserved context
Important facts are extracted and preserved, not lost in truncation.
Cost reduction
Cost reduction
Fewer tokens in context = lower API costs for each LLM call.
Better focus
Better focus
Compressed summaries help the LLM focus on relevant information.
Flexible compression
Flexible compression
Choose which concepts to preserve based on your use case.
Best Practices
Define domain-specific concepts
Define domain-specific concepts
Tailor concepts to your agent’s purpose. A code analyzer needs different concepts than a research assistant.
Use MULTIPLE for collections
Use MULTIPLE for collections
Use
FactType.MULTIPLE for concepts that naturally have many values (issues, files, findings).Monitor compression triggers
Monitor compression triggers
Track when compression happens to tune your thresholds appropriately.
Preserve last interactions
Preserve last interactions
The compression strategy automatically preserves the last tool call/result for continuity.
Test with real workloads
Test with real workloads
Verify that compressed agents maintain quality with your actual use cases.
When to Use
Use history compression when:- ✅ Agent handles long, multi-step tasks
- ✅ Multiple tool calls accumulate in history
- ✅ Context window limits are a concern
- ✅ You can define clear concepts to preserve
- ✅ Cost optimization is important
- ❌ Conversations are already short
- ❌ Every detail must be preserved verbatim
- ❌ Immediate context is all that matters
- ❌ Simplicity is more important than efficiency
Advanced: Custom Compression
Implement your own compression strategy:Compression vs. Memory Preservation
| Feature | History Compression | Memory (Facts) | Persistence (Checkpoints) |
|---|---|---|---|
| Purpose | Reduce token usage | Store structured knowledge | Save complete state |
| Duration | Current session | Across sessions | Across sessions |
| Format | LLM-generated summary | Structured facts | Complete snapshot |
| Use case | Long conversations | Knowledge retention | Recovery/rollback |
| Overhead | Low | Medium | High |
Related Features
Memory
Underlying feature that powers history compression
Persistence
Save complete agent state with checkpoints