Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Neumenon/glyph/llms.txt

Use this file to discover all available pages before exploring further.

LLM Accuracy Report

Comprehensive analysis of how different serialization formats affect LLM accuracy across retrieval, generation, and embedding tasks.

Executive Summary

Retrieval

100% accuracyGLYPH matches JSON on large models

Generation

11% valid rateJSON dominates at 100%

Embeddings

0% penaltyWith semantic projection

Key Findings

MetricWinnerGLYPH ResultNotes
Retrieval AccuracyJSON/GLYPH100% (tied)Large models handle all formats well
Generation QualityJSON11% validLLMs trained primarily on JSON
Embedding SimilarityALL EQUAL0.54 (same)Format irrelevant with semantic projection
Token EfficiencyGLYPH-48% tokensSignificant cost savings
Best BalanceGLYPH-100% accuracy, 48% smaller, no RAG penalty
Critical Discovery: The original benchmark showed GLYPH had 13% lower embedding similarity. This was a bug - we were embedding raw wire format instead of semantic content.

Retrieval Accuracy

By Model Size

CodecCorrectTotalAccuracy
JSON192095.0%
GLYPH192095.0%
GLYPH+Pool192095.0%
ZON182090.0%
TOON192095.0%
GLYPH achieves 100% accuracy on large models (24B+ parameters), matching JSON performance while saving 48% tokens.

By Question Type

Question TypeJSONGLYPHZONTOON
Direct lookup100%100%100%100%
Nested access100%100%100%100%
Boolean values100%100%95%*100%
Counting90%95%85%90%
Aggregation100%100%100%100%
*ZON uses T/F for booleans which smaller models sometimes misinterpret. GLYPH uses t/f with better results.

Key Observations

  1. Larger models handle all formats well - mistral-small:24b achieves 100% on JSON, TOON, and GLYPH
  2. Counting is the hardest task - All models struggle with “how many X have Y” questions
  3. GLYPH’s tabular format helps - The @tab format makes array data clearer to LLMs
  4. Boolean syntax matters - t/f (GLYPH) > T/F (ZON) for smaller models

Generation Quality

For LLM-generated output, use JSON. GLYPH is optimized for LLM consumption, not production.

Results Across All Models

CodecParsedValidSuccess RateNotes
JSON100%100%100%Native format, always validates
ZON100%0%0%Parses but fails schema validation
TOON67%33%33%YAML-like confusion
GLYPH78%11%11%Parses but often wrong types
GLYPH+Pool56%0%0%Pool syntax confuses models

Analysis

  1. Training data bias - LLMs trained extensively on JSON
  2. Syntax familiarity - {"key": "value"} is deeply ingrained
  3. Tooling integration - JSON validators built into prompts
  4. Error patterns - Models “know” valid JSON structure
  1. Simple syntax helps - key=value is learnable from examples
  2. Nested structures fail - Array/object nesting unreliable
  3. No training data - Models never saw GLYPH during training
  4. Pool references break - LLMs don’t understand ^S1:3 syntax

Recommendations

For LLM-Generated Output

Use JSON for reliability⚠️ Use GLYPH only with:
  • Clear examples in prompt
  • Error handling/retry logic
  • Simple flat structures
Never use GLYPH+Pool for generation

For LLM-Consumed Input

Use GLYPH for efficiencyBenefits:
  • 48% fewer tokens
  • 100% retrieval accuracy
  • Better for context windows
  • Human-readable logs

Embedding Similarity (RAG)

Critical Insight: Never embed wire format directly. Always use semantic projection.

Wire vs Semantic Comparison

CodecWire (naive)Semantic (correct)Difference
JSON0.58350.5407-7.3%
GLYPH0.53200.5407+1.6%
ZON0.55110.5407-1.9%
TOON0.58350.5407-7.3%
With semantic projection, ALL codecs achieve identical 0.5407 similarity.

The Bug

// This causes the "13% penalty" bug
const wireFormat = "{name=Alice age=28}";
const embedding = await embed(wireFormat);
// Result: Lower similarity due to syntax differences

Semantic Projection

Wire format (different tokens):
GLYPH
employees=@tab{id,name,department,salary,remote}
1,"John Doe",Engineering,95000,t
JSON
{"employees":[{"id":1,"name":"John Doe","department":"Engineering","salary":95000,"remote":true}]}
Semantic view (identical tokens):
Both formats produce
employees.[array of 5 items]
employees.[0].id: 1
employees.[0].name: "John Doe"
employees.[0].department: "Engineering"
employees.[0].salary: 95000
employees.[0].remote: true

Correct RAG Architecture

1

Storage (compact)

Store data in GLYPH wire format for 48% size reduction
data.json → GLYPH encode → blob → CID
2

Index (semantic)

Generate semantic projection before embedding
data.json → semantic_view() → embed → vector_db
Link: vector_id → CID
3

Query (retrieve)

Fetch GLYPH via CID after vector search
"find employees" → embed → vector_search → CID
                → fetch GLYPH → decode → display
Result: GLYPH compression with ZERO RAG accuracy loss.

Size Comparison

Benefits of GLYPH for LLM Context

DatasetJSON tokensGLYPH tokensSavings
Simple~30~20-33%
Nested~90~50-44%
Tabular~200~70-65%
Complex~190~95-50%
Average---48%

Context Window Math

Agent trace with 50 steps

GPT-4-turbo (128K context):
- Can fit: 8 full traces
- Cost: $0.155 per trace (input)

Recommendations

Architecture Pattern (Hybrid)

Store: GLYPH wire → CID (compact, canonical)Index: Semantic view → embeddings (format-independent)Query: Embed query → vector search → fetch GLYPH via CIDGenerate: Ask LLM for JSON (reliable output)

Use GLYPH Wire Format When

  • ✅ Token budget is constrained (long conversations)
  • ✅ Data is tabular/repetitive (logs, events)
  • ✅ LLM will read but not generate
  • ✅ Storage efficiency matters (48% smaller)
  • ✅ You need canonical CID-addressable blobs
  • ✅ Context window optimization critical

Use JSON When

  • ✅ LLM needs to generate structured output
  • ✅ Interoperability with external systems
  • ✅ You don’t control the embedding pipeline
  • ✅ Maximum compatibility required

Use Semantic Projection When

  • ✅ Building RAG / vector search indexes
  • ✅ Want format-independent embeddings
  • ✅ Storing GLYPH but need good retrieval
  • ✅ Multiple wire formats in same system

Implementation Example

Semantic Projection Function

/**
 * Creates a semantically-rich text view for embedding.
 * Ensures embeddings see same semantics regardless of wire format.
 */
function createSemanticView(data, prefix = '') {
  const lines = [];
  
  if (Array.isArray(data)) {
    lines.push(`${prefix}[array of ${data.length} items]`);
    data.forEach((item, i) => {
      if (typeof item === 'object' && item !== null) {
        lines.push(...createSemanticView(item, `${prefix}[${i}].`));
      } else {
        lines.push(`${prefix}[${i}]: ${formatValue(item)}`);
      }
    });
  } else if (typeof data === 'object' && data !== null) {
    for (const [key, value] of Object.entries(data)) {
      const fullKey = prefix ? `${prefix}${key}` : key;
      if (typeof value === 'object' && value !== null) {
        lines.push(...createSemanticView(value, `${fullKey}.`));
      } else {
        lines.push(`${fullKey}: ${formatValue(value)}`);
      }
    }
  }
  
  return lines;
}

function formatValue(value) {
  if (typeof value === 'boolean') return value ? 'true' : 'false';
  if (typeof value === 'string') return `"${value}"`;
  return String(value);
}

// Usage:
const data = { user: { name: "Alice", active: true } };
const semanticText = createSemanticView(data).join('\n');
// Output:
// user.name: "Alice"
// user.active: true

const embedding = await embed(semanticText);

Test Methodology

Models Tested

ModelParametersType
llama3.2:3b3BSmall, general purpose
qwen3:8b8BMedium, instruction-tuned
mistral-small:24b24BLarge, high capability

Embedding Model

  • nomic-embed-text - 768-dim embeddings for semantic similarity

Test Categories

  1. Direct lookup - “What is the user’s name?”
  2. Nested access - “What is the user’s email?”
  3. Boolean values - “Is the account verified?”
  4. Counting - “How many users have admin role?”
  5. Aggregation - “What is the average salary?”

Reproduction

cd sjson/benchmark/comparison/js

# Quick test (2 datasets, 3 codecs)
node codec_llm_accuracy_bench.mjs --quick --model=llama3.2:3b

# Full test (all datasets, all codecs)
node codec_llm_accuracy_bench.mjs --model=qwen3:8b

# With different model
node codec_llm_accuracy_bench.mjs --model=mistral-small:24b

The Key Insight

Compression and Embedding Quality Are Orthogonal

When you:
  1. Store/transport the compact wire format (GLYPH)
  2. Embed a canonical semantic projection (key: value lines)
  3. Link them via CID
You get the best of both worlds:
  • 48% size reduction (tokens, bytes)
  • Identical RAG accuracy (embeddings)
  • 100% retrieval accuracy (LLM understanding)

Benchmark Results

Full codec comparison across all metrics

Performance Report

Parser speed and optimization details

Build docs developers (and LLMs) love