Skip to main content

Schemas

All Mega Brain state and artifacts use JSON Schema validation for data integrity.

Schema Index

SchemaState FilePurpose
chunks-state.schema.jsonCHUNKS-STATE.jsonSemantic chunks with embeddings
canonical-map.schema.jsonCANONICAL-MAP.jsonEntity canonicalization mappings
insights-state.schema.jsonINSIGHTS-STATE.jsonExtracted insights by priority
narratives-state.schema.jsonNARRATIVES-STATE.jsonSynthesized narratives
file-registry.schema.jsonfile-registry.jsonProcessed file tracking
decisions-registry.schema.jsondecisions-registry.jsonCouncil decisions and precedents
Location: core/schemas/

State File Locations

processing/
├── chunks/
│   ├── CHUNKS-STATE.json           # Master chunk index
│   └── {source-id}.json            # Per-source chunks
├── canonical/
│   ├── CANONICAL-MAP.json          # Entity mappings
│   ├── ENTITY-REGISTRY.json        # Entity tracking
│   └── review_queue.jsonl          # Merge candidates
├── insights/
│   └── INSIGHTS-STATE.json         # Extracted insights
└── narratives/
    └── NARRATIVES-STATE.json       # Synthesized narratives

system/REGISTRY/
└── file-registry.json              # File tracking

logs/SYSTEM/
└── decisions-registry.json         # Council decisions

chunks-state.schema.json

Validates chunk state with embeddings and metadata. Schema Version: 1.0.0

Structure

{
  "metadata": {
    "version": 1,
    "created_at": "2026-03-05T12:00:00Z",
    "updated_at": "2026-03-05T14:30:00Z",
    "total_sources": 15,
    "total_chunks": 342
  },
  "chunks_by_source": {
    "CG001": {
      "source_id": "CG001",
      "source_name": "Cole Gordon",
      "source_file": "inbox/COLE-GORDON/PODCASTS/farm-system.txt",
      "source_hash": "sha256:...",
      "chunk_count": 23,
      "processed_at": "2026-03-05T12:00:00Z",
      "chunks": [
        {
          "chunk_id": "CG001-001",
          "content": "The farm system is...",
          "word_count": 847,
          "embedding": [0.123, -0.456, ...],  // 1024-dim vector
          "persons_mentioned": ["Cole Gordon"],
          "roles_mentioned": ["CLOSER", "BDR"],
          "themes": ["02-PROCESSO-VENDAS"],
          "priority": "HIGH",
          "metadata": {
            "chunk_index": 0,
            "start_char": 0,
            "end_char": 5234
          }
        }
      ]
    }
  },
  "change_log": [
    {
      "timestamp": "2026-03-05T12:00:00Z",
      "action": "source_added",
      "source_id": "CG001",
      "chunk_count": 23
    }
  ]
}

Field Definitions

metadata
object
required
Schema metadata and statistics
chunks_by_source
object
required
Dictionary mapping source_id to chunk data
change_log
array
required
Audit trail of all changes to this state file

Validation

import json
import jsonschema

# Load schema
with open('core/schemas/chunks-state.schema.json') as f:
    schema = json.load(f)

# Load data
with open('processing/chunks/CHUNKS-STATE.json') as f:
    data = json.load(f)

# Validate
jsonschema.validate(data, schema)
Source: core/schemas/chunks-state.schema.json:1-xxx

canonical-map.schema.json

Entity canonicalization mappings and aliases. Schema Version: 1.0.0

Structure

{
  "metadata": {
    "version": 15,
    "updated_at": "2026-03-05T14:30:00Z"
  },
  "persons": {
    "Alex Hormozi": {
      "canonical": "Alex Hormozi",
      "aliases": ["alex hormozi", "hormozi", "Alex H"],
      "sources": ["HR001", "HR002", "CG005"],
      "mention_count": 147,
      "has_agent": true,
      "has_dna": true
    }
  },
  "roles": {
    "CLOSER": {
      "canonical": "CLOSER",
      "aliases": ["closer", "sales closer", "closers"],
      "mention_count": 89,
      "mention_breakdown": {
        "direct": 75,
        "inferred": 10,
        "emergent": 4
      },
      "weighted_score": 85.5,
      "sources": ["CG001", "CG002", "JM001"],
      "has_agent": true,
      "domain_ids": ["SALES"]
    }
  },
  "themes": {
    "processo-vendas": {
      "canonical": "processo-vendas",
      "theme_code": "02-PROCESSO-VENDAS",
      "aliases": ["sales process", "processo de vendas"],
      "occurrence_count": 234,
      "sources": ["CG001", "JM001", "HR003"],
      "has_dossier": true,
      "domain_ids": ["SALES"]
    }
  },
  "concepts": {
    "Farm System": {
      "canonical": "Farm System",
      "aliases": ["farm system", "the farm"],
      "layer": "L4",  // DNA layer
      "occurrence_count": 42,
      "sources": ["CG001", "CG002"]
    }
  }
}

Usage

from core.intelligence.entity_normalizer import normalize_entity

result = normalize_entity("alex hormozi", "person")
# Returns: {"canonical": "Alex Hormozi", "match_type": "alias", ...}
Source: core/schemas/canonical-map.schema.json:1-xxx

insights-state.schema.json

Extracted insights with DNA layer classification. Schema Version: 1.0.0

Structure

{
  "metadata": {
    "version": 8,
    "updated_at": "2026-03-05T14:30:00Z"
  },
  "insights_state": {
    "persons": {
      "Cole Gordon": {
        "HIGH": [
          {
            "insight_id": "INS-CG001-001",
            "chunk_id": "CG001-005",
            "content": "The farm system requires 3 closers per setter to maintain balance.",
            "dna_layer": "L4",  // FRAMEWORKS
            "priority": "HIGH",
            "confidence": 0.95,
            "themes": ["01-ESTRUTURA-TIME", "02-PROCESSO-VENDAS"],
            "extracted_at": "2026-03-05T12:30:00Z"
          }
        ],
        "MEDIUM": [...],
        "LOW": [...]
      }
    },
    "themes": {
      "processo-vendas": {
        "HIGH": [...],
        "MEDIUM": [...],
        "LOW": [...]
      }
    }
  }
}

DNA Layer Mapping

LayerNameExample Insight
L1PHILOSOPHIES”Sales is a transfer of belief”
L2MENTAL-MODELS”Think in systems, not tactics”
L3HEURISTICS”If close rate < 20%, problem is qualification”
L4FRAMEWORKS”CLOSER framework: C-L-O-S-E-R steps”
L5METHODOLOGIES”Step 1: Clarify problem. Step 2: Label pain…”
Source: core/schemas/insights-state.schema.json:1-xxx

narratives-state.schema.json

Synthesized narratives with patterns and tensions. Schema Version: 1.0.0

Structure

{
  "metadata": {
    "version": 3,
    "updated_at": "2026-03-05T14:30:00Z"
  },
  "narratives_state": {
    "persons": {
      "Cole Gordon": {
        "narrative": "Cole Gordon's sales philosophy centers on...",
        "last_updated": "2026-03-05T14:00:00Z",
        "scope": "sales_methodology",
        "corpus": ["CG001", "CG002", "CG003"],
        "insights_included": ["INS-CG001-001", "INS-CG001-005", ...],
        "patterns_identified": [
          {
            "pattern": "Emphasis on team structure over individual performance",
            "evidence": ["CG001-005", "CG001-012"],
            "frequency": "recurring"
          }
        ],
        "tensions": [
          {
            "tension": "Balance between setter autonomy and farm system structure",
            "manifestation": "Wants setters to be creative but follow farm ratios",
            "evidence": ["CG001-008", "CG002-003"]
          }
        ],
        "open_loops": [
          {
            "question": "How to scale farm system beyond 50 closers?",
            "context": "CG001-015",
            "importance": "HIGH"
          }
        ],
        "next_questions": [
          "What's the maximum setter-to-closer ratio before quality drops?",
          "How does farm system adapt for different price points?"
        ]
      }
    },
    "themes": {
      "processo-vendas": {
        "narrative": "...",
        "perspectives": [
          {
            "person": "Cole Gordon",
            "viewpoint": "Farm system with 1:3 setter-closer ratio",
            "evidence": ["CG001-005"]
          },
          {
            "person": "Jeremy Miner",
            "viewpoint": "NEPQ methodology for consultative selling",
            "evidence": ["JM001-003"]
          }
        ],
        "consensus_points": [
          "Qualification is more important than closing skills"
        ],
        "tensions": [
          "Farm system (Cole) vs solo closer model (Jeremy)"
        ]
      }
    }
  }
}

Usage

# Use narratives for knowledge extraction
/extract-knowledge "auto"  # Reads NARRATIVES-STATE.json
Source: core/schemas/narratives-state.schema.json:1-xxx

file-registry.schema.json

Processed file tracking with checksums.

Structure

{
  "metadata": {
    "version": 42,
    "updated_at": "2026-03-05T14:30:00Z"
  },
  "files": [
    {
      "source_id": "CG001",
      "source_file": "inbox/COLE-GORDON/PODCASTS/farm-system.txt",
      "source_hash": "sha256:...",
      "source_name": "Cole Gordon",
      "source_company": "Cole Gordon",
      "processed_at": "2026-03-05T12:00:00Z",
      "chunk_count": 23,
      "status": "complete",
      "artifacts": [
        "/processing/chunks/CG001.json",
        "/knowledge/dossiers/persons/COLE-GORDON.md"
      ]
    }
  ]
}
Source: core/schemas/file-registry.schema.json:1-xxx

decisions-registry.schema.json

Council decisions and precedents.

Structure

{
  "metadata": {
    "version": 7,
    "updated_at": "2026-03-05T14:30:00Z"
  },
  "decisions": [
    {
      "decision_id": "20260305130249-CRO-CFO",
      "query": "Should we increase closer commission from 10% to 15%?",
      "date": "2026-03-05T13:02:49Z",
      "participants": ["CRO", "CFO", "CMO"],
      "council": ["critico-metodologico", "advogado-do-diabo", "sintetizador"],
      "recommendation": "Pilot 15% with top 20% performers for Q2",
      "confidence": 72,
      "chunk_ids": ["CG001-005", "HR003-012"],
      "sources": [
        "/knowledge/SOURCES/COLE-GORDON/04-COMISSIONAMENTO/closer-comp.md"
      ],
      "residual_risks": [
        "May increase CAC if close rate doesn't improve"
      ],
      "next_steps": [
        {
          "action": "Design pilot program criteria",
          "owner": "CRO",
          "deadline": "2026-03-15"
        }
      ]
    }
  ],
  "precedents": [
    {
      "precedent_id": "PREC-2026-001",
      "pattern": "Commission increase decisions",
      "guideline": "Always pilot with top performers first",
      "based_on": ["20260305130249-CRO-CFO", "20260201142035-CRO-CFO"]
    }
  ]
}
Source: core/schemas/decisions-registry.schema.json:1-xxx

ID System

Source IDs

Format: PREFIX + NNN Examples: CG001, JL003, HR005 Registered Prefixes:
PrefixPerson/ChannelCompany
JLJordan LeeAI Business
CJCharlie Johnson Show-
MTMax TornowMax Tornow Podcast
HRAlex Hormozi-
CGCole Gordon-
SSSam OvenSetterlun University
JMJeremy Miner7th Level

Chunk IDs

Format: {SOURCE_ID}-{NNN} Examples: CG001-001, JL003-015

Decision IDs

Format: YYYYMMDDHHMMSS-{ORIGIN}-{DEST} Example: 20260305130249-CRO-CFO

Precedent IDs

Format: PREC-YYYY-NNN Example: PREC-2026-001

Foreign Keys

Rastreability graph:
file-registry.json
  ├─ source_id ───────────────┐
  └─ chunk_count                  │


CHUNKS-STATE.json ◄──────────────┘
  ├─ source_id
  └─ chunks[]
      └─ chunk_id ──────────────┐

INSIGHTS-STATE.json ◄────────────├──────────┐
  └─ chunk_id                    │            │
      └─ insight_id ─────────────│──────────┤
                                 │            │
NARRATIVES-STATE.json ◄───────────┘            │
  └─ evidence_chain[] (chunk_ids)           │

decisions-registry.json ◄─────────────────────┘
  ├─ chunk_ids[]
  └─ sources[] (knowledge files)

Validation Tools

Python

import json
import jsonschema

def validate_state_file(state_file, schema_file):
    with open(schema_file) as f:
        schema = json.load(f)
    with open(state_file) as f:
        data = json.load(f)
    
    try:
        jsonschema.validate(data, schema)
        return True, "Valid"
    except jsonschema.ValidationError as e:
        return False, str(e)

CLI

# Validate all state files
python3 core/intelligence/validate_json_integrity.py

# Validate single file
python3 -m jsonschema -i CHUNKS-STATE.json core/schemas/chunks-state.schema.json

Schema Evolution

Version Increment Rules

  1. Never delete fields - Mark as deprecated
  2. Always validate before save - Use jsonschema
  3. Increment version on each schema change
  4. Maintain change_log for auditability

Migration

When schema changes:
  1. Create migration script: scripts/migrate_v{N}_to_v{N+1}.py
  2. Update schema file with new version
  3. Run migration on all state files
  4. Validate with new schema

See Also

Build docs developers (and LLMs) love