Skip to main content
The Mega Brain pipeline is a semantic processing system that ingests expert materials and transforms them into structured, traceable knowledge across 5 DNA layers.

Pipeline Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                           PIPELINE JARVIS v2.1                               │
│                                                                              │
│  8 PHASES                                                                    │
│  ├─ Phase 1: INITIALIZATION + VALIDATION                                    │
│  ├─ Phase 2: CHUNKING                                                       │
│  ├─ Phase 3: ENTITY RESOLUTION                                              │
│  ├─ Phase 4: INSIGHT EXTRACTION                                             │
│  ├─ Phase 5: NARRATIVE SYNTHESIS                                            │
│  ├─ Phase 6: DOSSIER COMPILATION                                            │
│  ├─ Phase 7: AGENT ENRICHMENT                                               │
│  └─ Phase 8: FINALIZATION + EXECUTION REPORT                                │
└─────────────────────────────────────────────────────────────────────────────┘
Core Constraint: Process 100% of content. No summarization, no omission. Every insight must trace back to source with full lineage.

Phase 1: Initialization + Validation

Validate input files, extract metadata from paths, load state files, and check for duplicate processing.
1

Validate Input

⛔ CHECKPOINT PRE-1.1
[ ] File exists in $ARGUMENTS
[ ] File has content (> 100 chars)
[ ] Metadata identifiable (source person/company)
2

Extract Path Metadata

Parse file path to extract:
  • SOURCE_PERSON - Folder after inbox/
  • SOURCE_COMPANY - Content in parentheses
  • SOURCE_TYPE - Material type (MASTERMINDS, COURSES, etc.)
  • SOURCE_ID - Unique hash (e.g., “CG003”)
  • SCOPE - course | company | personal
  • CORPUS - Derived from SOURCE_COMPANY
3

Load State Files

Load or create:
  • CHUNKS-STATE.json
  • CANONICAL-MAP.json
  • INSIGHTS-STATE.json
  • NARRATIVES-STATE.json
4

Check Already Processed

Search existing chunks for SOURCE_ID. If found, ask user whether to reprocess.
Output: Validated input, metadata extracted, state files loaded

Phase 2: Chunking

Segment content into semantic chunks (~300 words) while preserving context, timestamps, and speaker labels.
Protocol: core/templates/PIPELINE/PROMPT-1.1-CHUNKING.md

Chunking Rules

  • Chunk size: ~300 words (~1000 tokens)
  • Preserve: Timestamps, speaker labels, formatting
  • Extract: People (raw mentions), themes (raw topics)
  • Generate: Sequential chunk_id like chunk_CG003_001

Process

1

Read Full Content

Load entire source file, count words
2

Execute Chunking

Apply semantic segmentation while maintaining context boundaries
3

Merge and Save

{
  "chunk_id": "CG003-001",
  "content": "[chunk text]",
  "meta": {
    "source_id": "CG003",
    "source_person": "Cole Gordon",
    "timestamp": "00:05:23",
    "word_count": 287
  },
  "entities": {
    "pessoas": ["Cole Gordon", "Alex Hormozi"],
    "temas": ["Sales", "Closing Techniques"]
  }
}
Merge new chunks into CHUNKS-STATE.json, deduplicate by chunk_id
Checkpoint: count(new_chunks) > 0, each chunk has unique ID, state file saved

Phase 3: Entity Resolution

Normalize entity names (people, companies, themes) to canonical forms to prevent duplication.
Protocol: core/templates/PIPELINE/PROMPT-1.2-ENTITY-RESOLUTION.md

Resolution Rules

  • Threshold: 0.85 confidence for merging
  • Prefer: Longest/most explicit form as canonical
  • NEVER merge: Across different corpus
  • Flag collisions: For human review

Examples

Raw MentionsCanonical Form
”Cole”, “Cole G”, “Cole Gordon”Cole Gordon
”Hormozi”, “Alex H”, “Alex Hormozi”Alex Hormozi
”TSC”, “The Scalable Company”The Scalable Company
Output: Canonicalized chunks, updated CANONICAL-MAP.json, review queue for ambiguous cases

Phase 4: Insight Extraction

Extract actionable insights from chunks, classify by priority, and detect contradictions.
Protocol: core/templates/PIPELINE/PROMPT-2.1-INSIGHT-EXTRACTION.md

Insight Structure

{
  "chunk_id": "CG003-042",
  "insight": "Respond to leads within 5 minutes to capture 80% higher conversion rate",
  "priority": "high",
  "scope": "company",
  "corpus": "Sales Training",
  "confidence": 0.92,
  "status": "new",
  "source": {
    "source_id": "CG003",
    "source_title": "NEPQ Masterclass Session 3",
    "source_type": "MASTERMINDS"
  }
}

Priority Levels

High

Immediately actionable, high-impact insights

Medium

Important context, strategic guidance

Low

Supporting details, background information
Output: Insights organized by person and theme in INSIGHTS-STATE.json

Phase 5: Narrative Synthesis

Synthesize insights into coherent narratives for each person and theme, tracking tensions and open questions.
Protocol: core/templates/PIPELINE/PROMPT-3.1-NARRATIVE-SYNTHESIS.md

Narrative Structure

{
  "person": "Cole Gordon",
  "narrative": "Cole Gordon's approach to sales centers on...",
  "insights_included": ["CG003-042", "CG003-067"],
  "tensions": [
    {
      "description": "Balance between speed and qualification",
      "insights": ["CG003-042", "CG005-023"]
    }
  ],
  "open_loops": [
    {
      "question": "What's the ideal team size for scaling?",
      "status": "OPEN",
      "chunk_ids": ["CG003-089"]
    }
  ],
  "next_questions": [
    "How does this scale beyond 10 salespeople?"
  ]
}

Merge Rules (CRITICAL)

  • narrative: CONCATENATE with update separator
  • insights_included[]: APPEND (never replace)
  • tensions[]: APPEND new ones
  • open_loops[]: APPEND new, mark RESOLVED when answered
  • next_questions[]: REPLACE (only exception)
Output: Updated NARRATIVES-STATE.json with synthesized narratives

Phase 6: Dossier Compilation

Compile comprehensive dossiers for persons and themes with full source traceability.
Protocol: core/templates/PIPELINE/DOSSIER-COMPILATION-PROTOCOL.md

Dossier Types

# DOSSIER: Cole Gordon

## Overview
Expert in high-ticket sales, NEPQ methodology, sales team scaling

## Core Philosophy
- L1: Philosophies extracted from sources
- L2: Mental models
- L3: Heuristics
- L4: Frameworks
- L5: Methodologies

## Key Insights
[HIGH] Respond to leads in <5 min (Source: CG003)
[HIGH] Use NEPQ framework for qualification (Source: CG001)

## Sources
- CG001: NEPQ Masterclass Session 1
- CG003: NEPQ Masterclass Session 3
Output: Markdown dossiers in knowledge/dossiers/persons/ and knowledge/dossiers/themes/

Phase 7: Agent Enrichment

Update agent knowledge and memory files with new insights, respecting agent boundaries.

Process

1

Compile Knowledge Payload

Extract frameworks, techniques, metrics, and high-priority insights discovered
2

Check Role Threshold

  • >=10 mentions: Flag “Create New Agent”
  • >=5 mentions: Flag “Monitor Role”
3

Present Options

1. ✅ SIM - Update AGENT-*.md + MEMORY-*.md
2. 📝 APENAS MEMORY - Update memory only
3. ⏭️ PULAR - Skip for now
4

Execute Updates

Update relevant agent files with new knowledge, maintaining agent voice and structure
5

Template Evolution Check

If new knowledge doesn’t fit existing template structure, trigger evolution protocol
Output: Updated agent memories, optionally updated agent definitions

Phase 8: Finalization

Execute automatic cleanup, generate execution report, and verify pipeline integrity.

Automatic Actions

1

RAG Index

python scripts/rag_index.py --knowledge --force
2

File Registry

python scripts/file_registry.py --scan
3

Session State

Update SESSION-STATE.md with processed file
4

Role Tracking

Update agents/DISCOVERY/role-tracking.md
5

Audit Log

Append to logs/AUDIT/audit.jsonl

Final Verification (9 Items)

[ ] CHUNKS-STATE.json contains chunks from SOURCE_ID
[ ] CANONICAL-MAP.json updated with entities
[ ] INSIGHTS-STATE.json contains insights from SOURCE_ID
[ ] NARRATIVES-STATE.json contains narrative for SOURCE_PERSON
[ ] At least 1 dossier in /knowledge/dossiers/
[ ] RAG index includes new files
[ ] file-registry.json has entry for source file
[ ] SESSION-STATE.md updated
[ ] audit.jsonl contains session entry

Execution Report

═══════════════════════════════════════════════════════════════════════════
                         EXECUTION REPORT
                         Pipeline Jarvis v2.1
═══════════════════════════════════════════════════════════════════════════

📅 Date: 2026-03-06
📁 Source: Cole Gordon (CG003)
📄 File: nepq-masterclass-session-3.txt

┌─────────────────────────────────────────────────────────────────────────┐
│ METRICS                                                                 │
├─────────────────────────────────────────────────────────────────────────┤
│ Chunks created:      87                                                 │
│ Entities resolved:   23                                                 │
│ Insights extracted:  156 (42 HIGH, 89 MED, 25 LOW)                     │
│ Narratives generated:  3 persons, 5 themes                             │
│ Dossiers compiled:  2 created, 1 updated                               │
│ Agents enriched: [Sales-Lead, NEPQ-Specialist]                         │
└─────────────────────────────────────────────────────────────────────────┘

✅ PIPELINE JARVIS v2.1 COMPLETE

Pipeline Commands

CommandDescription
/process-jarvisRun full pipeline on specified file
/ingestAdd new material to inbox
/saveSave current pipeline state
/resumeResume interrupted pipeline

Next Steps

DNA Schema

Learn about the 5-layer knowledge extraction

Architecture

Understand the system architecture

Build docs developers (and LLMs) love