Ingesting Materials

Mega Brain accepts any format of expert content and transforms it into structured knowledge. This guide walks you through ingesting different material types into the system.

Overview

The ingestion process automatically:

Downloads and transcribes content
Extracts metadata (author, title, date)
Saves files with source traceability
Detects duplicates to prevent reprocessing
Prepares materials for the processing pipeline

All ingested materials are saved to the inbox/ directory organized by person and content type.

Supported Material Types

YouTube Videos
PDFs & Documents
Podcasts & Audio
Courses & Training

Ingesting YouTube Videos

The most common source. JARVIS automatically downloads transcriptions using the YouTube Transcript API.

/ingest https://www.youtube.com/watch?v=EXAMPLE_ID

What happens:

Video metadata is fetched (title, author, duration)
Transcript is downloaded automatically
File is saved as inbox/[PERSON]/[TYPE]/video-title.txt
A unique SOURCE_ID is generated (e.g., CG003 for Cole Gordon video 3)

Example output:

JARVIS: Material received.

  Source:    YouTube
  Title:     "How to Create Irresistible Offers"
  Author:    Alex Hormozi
  Duration:  42:15
  Words:     6,230

  Saved to: inbox/alex-hormozi/MASTERCLASS/how-to-create-irresistible-offers.txt

  Next step: execute /process-jarvis to process.

Ingesting PDFs and Documents

Upload local files including PDFs, Word documents, and text files.

/ingest /path/to/playbook-vendas.pdf

Supported formats:

PDF documents
Microsoft Word (.doc, .docx)
Plain text (.txt, .md)
Google Docs (via URL)

For best results, ensure PDFs have text layers (not just scanned images).

Ingesting Podcasts and Audio

Process audio content through transcription.

/ingest https://podcast-url.com/episode-123

Requirements:

OPENAI_API_KEY configured (uses Whisper for transcription)

Content detection: The system automatically detects podcast episodes based on keywords (“podcast”, “episode”, “ep”, “interview”), file metadata, and URL patterns.

Ingesting Course Materials

Process complete training courses with multiple modules.

/ingest /path/to/course-module-1.pdf --type COURSE

Scope detection: Course materials are automatically tagged with scope:

scope: "course" - Complete training programs
corpus: "[course-name]" - Groups related modules

Organization:

inbox/
└── setterlun/
    └── COURSES/
        ├── module-01-foundation.txt
        ├── module-02-strategy.txt
        └── module-03-execution.txt

Advanced Ingestion Options

Manual Metadata Override

Override auto-detected metadata:

# Specify person explicitly
/ingest https://youtube.com/watch?v=abc123 --person "Cole Gordon"

# Set content type
/ingest /path/to/file.txt --type MASTERCLASS

# Ingest and immediately process
/ingest https://youtube.com/watch?v=abc123 --process

Content Type Classification

Available Content Types

Type	Auto-Detection Keywords	Use Case
PODCAST	”podcast”, “episode”, “interview”	Conversational content
MASTERCLASS	”masterclass”, “mastermind”, “training”	Expert workshops
COURSE	”course”, “module”, “lesson”	Structured programs
BLUEPRINT	”blueprint”, “pdf”, “playbook”	Strategic documents
VSL	”vsl”, “webinar”, “sales letter”	Sales presentations
SCRIPT	”script”, “template”, “copy”	Templates and frameworks

Known Source Detection

The system recognizes expert sources automatically:

Path Analysis

Analyzes the file path or URL to detect known experts:

inbox/COLE GORDON/MASTERMINDS/video.txt
       ↓
SOURCE_PERSON: Cole Gordon
SOURCE_COMPANY: Cole Gordon
CORPUS: closers_io

Known Sources Library

Detected Pattern	Person	Company	Corpus
”hormozi”, “acquisition”	Alex Hormozi	Alex Hormozi	acquisition_com
”cole gordon”, “closers”	Cole Gordon	Cole Gordon	closers_io
”leila”	Leila Hormozi	Alex Hormozi	acquisition_com
”setterlun”, “sam ovens”	Sam Ovens	Setterlun University	setterlun

Automatic Corpus Assignment

Related materials from the same expert are grouped into a corpus for cross-referencing.

Duplicate Detection

The pipeline includes 6 levels of duplicate detection to prevent reprocessing:

Detection Levels

Exact MD5 Match - Same file already processed
Content Hash - Different file, same content
Partial Match - Content is substring of processed material
YouTube ID - Same video from different source
File Registry - Cross-reference with processing history
Chunk Fingerprint - First 1000 characters comparison

When duplicates are found:

┌─────────────────────────────────────────────────────────┐
│  ⛔ DUPLICATE DETECTED - PROCESSING INTERRUPTED        │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Current file:  inbox/new-video.txt                     │
│  MD5:           a1b2c3d4e5f6...                         │
│                                                         │
│  Duplicate of:  inbox/old-video.txt                     │
│  Registered:    2026-02-15                              │
│  SOURCE_ID:     CG005                                   │
│                                                         │
│  ACTION: File will NOT be processed.                    │
│                                                         │
└─────────────────────────────────────────────────────────┘

Batch Ingestion

Process multiple materials at once:

# List of YouTube videos
/ingest https://youtube.com/watch?v=video1
/ingest https://youtube.com/watch?v=video2
/ingest https://youtube.com/watch?v=video3

Verification

Check Inbox Status

View pending materials:

/inbox

Output:

═══════════════════════════════════════════════
                INBOX STATUS
═══════════════════════════════════════════════

📥 PENDING: 3 files

  Cole Gordon (2 files)
  ├─ masterclass-closing.txt (6,234 words)
  └─ podcast-interview.txt (8,901 words)

  Alex Hormozi (1 file)
  └─ offer-creation-framework.txt (4,567 words)

⭐ NEXT: /process-jarvis to process pending files

Check System Status

/jarvis-briefing

Provides:

Health score (0-100)
Processed materials count
Pending inbox files
Active agents status

Next Steps

Process Ingested Materials

Learn how to run the 5-phase processing pipeline on ingested materials

View Processing Results

Check processing logs and generated artifacts

Get Started

Core Concepts

CLI Commands

Guides

Advanced

Overview

Supported Material Types

Ingesting YouTube Videos

Ingesting PDFs and Documents

Ingesting Podcasts and Audio

Ingesting Course Materials

Advanced Ingestion Options

Manual Metadata Override

Content Type Classification

Known Source Detection

Duplicate Detection

Detection Levels

Batch Ingestion

Verification

Check Inbox Status

Check System Status

Next Steps

Process Ingested Materials

View Processing Results

Build docs developers (and LLMs) love

Get Started

Core Concepts

CLI Commands

Guides

Advanced

Documentation Index

​Overview

​Supported Material Types

​Ingesting YouTube Videos

​Ingesting PDFs and Documents

​Ingesting Podcasts and Audio

​Ingesting Course Materials

​Advanced Ingestion Options

​Manual Metadata Override

​Content Type Classification

​Known Source Detection

​Duplicate Detection

​Detection Levels

​Batch Ingestion

​Verification

​Check Inbox Status

​Check System Status

​Next Steps

Process Ingested Materials

View Processing Results

Build docs developers (and LLMs) love

Overview

Supported Material Types

Ingesting YouTube Videos

Ingesting PDFs and Documents

Ingesting Podcasts and Audio

Ingesting Course Materials

Advanced Ingestion Options

Manual Metadata Override

Content Type Classification

Known Source Detection

Duplicate Detection

Detection Levels

Batch Ingestion

Verification

Check Inbox Status

Check System Status

Next Steps