Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Neumenon/glyph/llms.txt
Use this file to discover all available pages before exploring further.
LLM Accuracy Report
Comprehensive analysis of how different serialization formats affect LLM accuracy across retrieval, generation, and embedding tasks.Executive Summary
Retrieval
100% accuracyGLYPH matches JSON on large models
Generation
11% valid rateJSON dominates at 100%
Embeddings
0% penaltyWith semantic projection
Key Findings
| Metric | Winner | GLYPH Result | Notes |
|---|---|---|---|
| Retrieval Accuracy | JSON/GLYPH | 100% (tied) | Large models handle all formats well |
| Generation Quality | JSON | 11% valid | LLMs trained primarily on JSON |
| Embedding Similarity | ALL EQUAL | 0.54 (same) | Format irrelevant with semantic projection |
| Token Efficiency | GLYPH | -48% tokens | Significant cost savings |
| Best Balance | GLYPH | - | 100% accuracy, 48% smaller, no RAG penalty |
Retrieval Accuracy
By Model Size
- llama3.2:3b
- qwen3:8b
- mistral-small:24b
| Codec | Correct | Total | Accuracy |
|---|---|---|---|
| JSON | 19 | 20 | 95.0% |
| GLYPH | 19 | 20 | 95.0% |
| GLYPH+Pool | 19 | 20 | 95.0% |
| ZON | 18 | 20 | 90.0% |
| TOON | 19 | 20 | 95.0% |
GLYPH achieves 100% accuracy on large models (24B+ parameters), matching JSON performance while saving 48% tokens.
By Question Type
| Question Type | JSON | GLYPH | ZON | TOON |
|---|---|---|---|---|
| Direct lookup | 100% | 100% | 100% | 100% |
| Nested access | 100% | 100% | 100% | 100% |
| Boolean values | 100% | 100% | 95%* | 100% |
| Counting | 90% | 95% | 85% | 90% |
| Aggregation | 100% | 100% | 100% | 100% |
*ZON uses
T/F for booleans which smaller models sometimes misinterpret. GLYPH uses t/f with better results.Key Observations
- Larger models handle all formats well - mistral-small:24b achieves 100% on JSON, TOON, and GLYPH
- Counting is the hardest task - All models struggle with “how many X have Y” questions
- GLYPH’s tabular format helps - The
@tabformat makes array data clearer to LLMs - Boolean syntax matters -
t/f(GLYPH) >T/F(ZON) for smaller models
Generation Quality
Results Across All Models
| Codec | Parsed | Valid | Success Rate | Notes |
|---|---|---|---|---|
| JSON | 100% | 100% | 100% | Native format, always validates |
| ZON | 100% | 0% | 0% | Parses but fails schema validation |
| TOON | 67% | 33% | 33% | YAML-like confusion |
| GLYPH | 78% | 11% | 11% | Parses but often wrong types |
| GLYPH+Pool | 56% | 0% | 0% | Pool syntax confuses models |
Analysis
Why JSON dominates generation
Why JSON dominates generation
- Training data bias - LLMs trained extensively on JSON
- Syntax familiarity -
{"key": "value"}is deeply ingrained - Tooling integration - JSON validators built into prompts
- Error patterns - Models “know” valid JSON structure
Why GLYPH struggles (but still works sometimes)
Why GLYPH struggles (but still works sometimes)
- Simple syntax helps -
key=valueis learnable from examples - Nested structures fail - Array/object nesting unreliable
- No training data - Models never saw GLYPH during training
- Pool references break - LLMs don’t understand
^S1:3syntax
Recommendations
For LLM-Generated Output
✅ Use JSON for reliability⚠️ Use GLYPH only with:
- Clear examples in prompt
- Error handling/retry logic
- Simple flat structures
For LLM-Consumed Input
✅ Use GLYPH for efficiencyBenefits:
- 48% fewer tokens
- 100% retrieval accuracy
- Better for context windows
- Human-readable logs
Embedding Similarity (RAG)
Critical Insight: Never embed wire format directly. Always use semantic projection.
Wire vs Semantic Comparison
| Codec | Wire (naive) | Semantic (correct) | Difference |
|---|---|---|---|
| JSON | 0.5835 | 0.5407 | -7.3% |
| GLYPH | 0.5320 | 0.5407 | +1.6% |
| ZON | 0.5511 | 0.5407 | -1.9% |
| TOON | 0.5835 | 0.5407 | -7.3% |
With semantic projection, ALL codecs achieve identical 0.5407 similarity.
The Bug
Semantic Projection
Wire format (different tokens):GLYPH
JSON
Both formats produce
Correct RAG Architecture
Result: GLYPH compression with ZERO RAG accuracy loss.
Size Comparison
Benefits of GLYPH for LLM Context
| Dataset | JSON tokens | GLYPH tokens | Savings |
|---|---|---|---|
| Simple | ~30 | ~20 | -33% |
| Nested | ~90 | ~50 | -44% |
| Tabular | ~200 | ~70 | -65% |
| Complex | ~190 | ~95 | -50% |
| Average | - | - | -48% |
Context Window Math
Recommendations
Architecture Pattern (Hybrid)
Store: GLYPH wire → CID (compact, canonical)Index: Semantic view → embeddings (format-independent)Query: Embed query → vector search → fetch GLYPH via CIDGenerate: Ask LLM for JSON (reliable output)
Use GLYPH Wire Format When
- ✅ Token budget is constrained (long conversations)
- ✅ Data is tabular/repetitive (logs, events)
- ✅ LLM will read but not generate
- ✅ Storage efficiency matters (48% smaller)
- ✅ You need canonical CID-addressable blobs
- ✅ Context window optimization critical
Use JSON When
- ✅ LLM needs to generate structured output
- ✅ Interoperability with external systems
- ✅ You don’t control the embedding pipeline
- ✅ Maximum compatibility required
Use Semantic Projection When
- ✅ Building RAG / vector search indexes
- ✅ Want format-independent embeddings
- ✅ Storing GLYPH but need good retrieval
- ✅ Multiple wire formats in same system
Implementation Example
Semantic Projection Function
Test Methodology
Models Tested
| Model | Parameters | Type |
|---|---|---|
| llama3.2:3b | 3B | Small, general purpose |
| qwen3:8b | 8B | Medium, instruction-tuned |
| mistral-small:24b | 24B | Large, high capability |
Embedding Model
nomic-embed-text- 768-dim embeddings for semantic similarity
Test Categories
- Direct lookup - “What is the user’s name?”
- Nested access - “What is the user’s email?”
- Boolean values - “Is the account verified?”
- Counting - “How many users have admin role?”
- Aggregation - “What is the average salary?”
Reproduction
The Key Insight
Compression and Embedding Quality Are Orthogonal
When you:
- Store/transport the compact wire format (GLYPH)
- Embed a canonical semantic projection (key: value lines)
- Link them via CID
- 48% size reduction (tokens, bytes)
- Identical RAG accuracy (embeddings)
- 100% retrieval accuracy (LLM understanding)
Related Documentation
Benchmark Results
Full codec comparison across all metrics
Performance Report
Parser speed and optimization details