TOON is designed for token efficiency, but performance considerations extend beyond token count. This guide covers latency, throughput, memory usage, and optimization strategies for production use.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/toon-format/toon/llms.txt
Use this file to discover all available pages before exploring further.
Performance Characteristics
Token Efficiency vs. Processing Speed
TOON reduces token count significantly (35-60% vs JSON for uniform arrays), but encoding/decoding has computational overhead:- Encoding: Analyzing array uniformity and building tabular structures
- Decoding: Parsing headers, validating lengths, and reconstructing objects
For latency-critical applications, measure end-to-end response time on your exact setup. Some deployments (especially local/quantized models) may process compact JSON faster despite higher token count.
Performance Trade-offs
The optimal format depends on your bottleneck:| Bottleneck | Best Format | Reason |
|---|---|---|
| API costs | TOON | Minimizes tokens ($$) |
| Context limits | TOON | Fits more data in window |
| Network transfer | JSON compact | Smaller bytes over wire |
| Parsing speed | JSON | Simpler structure |
| Local inference | Benchmark both | Token/sec varies by model |
Memory Usage
Encoding
TOON encoding builds the full output string in memory:Decoding
TOON decoding builds the full object tree in memory:Streaming Decode Event Types
Streaming Decode Event Types
The
decodeStream API yields structured JSON events:{ type: 'startObject' }- Object begins{ type: 'endObject' }- Object ends{ type: 'startArray', length: number }- Array begins (length from[N]){ type: 'endArray' }- Array ends{ type: 'key', key: string, wasQuoted?: boolean }- Object key{ type: 'primitive', value: JsonPrimitive }- Primitive value
Optimization Strategies
1. Use Tab Delimiters for Maximum Efficiency
Comma is the default delimiter, but tab is more token-efficient:2. Enable Key Folding for Nested Objects
Key folding collapses single-key wrapper chains into dotted paths:keyFolding: 'safe' only folds when all segments are valid identifiers. Set expandPaths: 'safe' when decoding to reconstruct the original structure.3. Use Streaming for Large Datasets
Streaming encoding/decoding prevents memory spikes:4. Control Indentation for Compact Output
TOON defaults to 2-space indentation. Use 0 spaces for maximum compactness:Zero indentation reduces readability. Use it only when token count is critical and human inspection isn’t needed.
5. Filter Data with Replacer Function
Remove unnecessary fields before encoding to reduce token count:Latency Considerations
Time to First Token (TTFT)
TOON’s structured headers may slightly increase TTFT:- JSON: Model starts generating immediately
- TOON: Model must emit array header (
[N]{fields}:) before first row
Tokens per Second
Some models process simpler structures faster:| Scenario | Recommendation |
|---|---|
| Cloud APIs | Use TOON (token-based pricing) |
| Local models | Benchmark both formats |
| Streaming output | Test TTFT and tokens/sec |
Production Best Practices
1. Cache Encoded Outputs
If encoding the same data repeatedly, cache the TOON output:2. Validate Input Data
TOON encoding assumes valid JSON data model. Validate inputs to prevent encoding errors:3. Monitor Token Usage
Track token savings in production:4. Use Strict Mode for Production
Strict mode validates array lengths and tabular row counts during decoding:5. Handle Decode Errors Gracefully
Wrap decode calls in try-catch for production:Performance Checklist
Encoding
Encoding
- Use tab delimiter for production (
delimiter: DELIMITERS.tab) - Enable key folding for nested configs (
keyFolding: 'safe') - Use streaming for large datasets (
encodeLines()) - Filter unnecessary fields with replacer function
- Cache encoded outputs when possible
- Reduce indentation if tokens are critical (
indent: 0)
Decoding
Decoding
- Use streaming for large inputs (
decodeStream()) - Enable strict mode in production (
strict: true) - Handle decode errors gracefully (try-catch)
- Enable path expansion if using key folding (
expandPaths: 'safe')
Production Monitoring
Production Monitoring
- Track token savings (TOON vs JSON)
- Monitor encoding/decoding latency
- Measure end-to-end response time
- Validate memory usage with large datasets
- Test with your specific model/tokenizer
