Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ragaeeb/kokokor/llms.txt
Use this file to discover all available pages before exploring further.
Overview
TheTextBlock type represents a processed text unit that has been assembled from raw OCR observations and enriched with semantic metadata about its role within the document.
TextBlock extends the base
Observation type with additional metadata flags that guide formatting and structure reconstruction.Type Definition
src/types.ts:389
Base Observation Type
Every TextBlock includes the fundamental properties fromObservation:
src/types.ts:170, src/types.ts:11
Metadata Properties
isCentered
Indicates whether the text is centered on the page with adequate whitespace on both sides.Detection Criteria
Detection Criteria
Text is considered centered when it meets both conditions:Reference:
- Center Point Alignment: The text’s center is within tolerance of the page center
- Sufficient Margins: Adequate whitespace exists on both left and right sides
src/utils/layout.ts:36Common Use Cases
Common Use Cases
- Document titles and headings
- Poetry and verse content
- Section headers
- Epigraphs or quotes
Configuration
Configuration
isFootnote
Indicates whether the text appears below horizontal line separators and should be treated as footnote content.Detection Logic
Detection Logic
Footnotes are identified by their position relative to horizontal lines:The internal algorithm:
- Filters out horizontal lines contained within rectangles (headers)
- Excludes lines very close to the top edge (artifacts)
- Returns the Y-coordinate of the last qualifying line
Use Cases
Use Cases
- Academic citations
- Reference notes
- Supplementary information
- Translation notes
Processing
Processing
Footnotes are processed separately from body text:Reference:
src/utils/paragraphs.ts:238isHeading
Indicates whether the text represents a heading, typically identified by visual presentation.Detection Logic
Detection Logic
Headings are identified when text is contained within rectangular borders:Reference: Reference:
src/utils/paragraphs.ts:122The containment check includes pixel tolerance for OCR inaccuracies:src/utils/layout.ts:116Formatting
Formatting
Headings receive special formatting with blank lines after them:Reference:
src/index.ts:20Output:Use Cases
Use Cases
- Chapter titles
- Section headings
- Boxed announcements
- Highlighted content
isPoetic
Indicates whether the text is identified as poetry or verse content that should preserve line breaks.Detection Heuristics
Detection Heuristics
Poetry detection uses multiple coordinated heuristics:
Reference:
src/utils/poetry.ts:365See Poetry Detection for detailed algorithms.Preservation
Preservation
Poetic lines are never merged into paragraphs:Reference:
src/utils/paragraphs.ts:204Use Cases
Use Cases
- Classical poetry
- Quranic verses
- Song lyrics
- Formatted verse content
Real-World Examples
Example 1: Centered Heading
- Page width: 960px
- Text center: 298 + 286/2 = 441px
- Page center: 960/2 = 480px
- Difference: 39px (4% of page width) ✓
- Left margin: 298px (31% of page width) ✓
- Right margin: 376px (39% of page width) ✓
Example 2: Poetry Pair (Hemistichs)
- Similar widths: 220px vs 210px (5% difference) ✓
- Word counts: 3 vs 3 (equal) ✓
- Vertical gap: 0px (same line) ✓
- Combined centering: (150 + 640) / 2 = 395px, center = 400px ✓
Example 3: Footnote
- Last horizontal line Y: 1000px
- Text Y position: 1050px (below line) ✓
Example 4: Regular Prose
Using Metadata in Processing
Conditional Formatting
Filtering
Export to Different Formats
Metadata Interaction
Multiple Flags
A TextBlock can have multiple metadata flags:Priority Rules
When multiple flags are present, formatting follows this priority:- isHeading (highest priority - structural)
- isPoetic (preserve line breaks)
- isFootnote (section separation)
- isCentered (visual hint)
API Reference
Observation Type
Base type for OCR text observations
BoundingBox Type
Position and dimension properties
ReconstructResult
Complete pipeline output structure
Pipeline Overview
How TextBlocks flow through processing
Next Steps
Poetry Detection
Deep dive into poetry identification
Processing Pipeline
Understand the three-stage pipeline