Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ragaeeb/kokokor/llms.txt
Use this file to discover all available pages before exploring further.
Overview
TextBlock represents a higher-level text unit (typically a line or paragraph) that has been assembled from individual OCR observations and enriched with semantic information about its role and characteristics within the document. This type extends the basic Observation with additional metadata that helps in document structure analysis, formatting preservation, and content classification.Type Definition
Fields
The bounding box defining the exact position and dimensions of the text within the document coordinate system. Inherited from Observation.
The recognized text content. Inherited from Observation.
Indicates whether the text is centered on the page.This is determined by analyzing the text’s position relative to page margins and ensuring adequate whitespace on both sides. Centered text often indicates:
- Document titles and headings
- Poetry or verse content
- Section headers
- Epigraphs or quotes
Indicates whether this text is identified as a footnote.Footnotes are typically detected by their position relative to horizontal line elements in the document. Text appearing below the last significant horizontal line is often classified as footnote content. This classification helps in:
- Separating main content from supplementary information
- Proper document structure reconstruction
- Academic and formal document processing
Indicates whether the text represents a heading or title.Headings are often identified by their visual presentation, such as:
- Being enclosed within rectangular borders or boxes
- Having distinctive spacing or positioning
- Being centered or specially formatted
- Document outline generation
- Hierarchical content structuring
- Navigation and indexing
Indicates whether this text is identified as a line of poetry or verse.Poetic content is detected using multiple heuristics including:
- Line length and spacing patterns
- Word density analysis
- Centering and alignment characteristics
- Paired hemistich detection
- They are not merged into standard paragraphs
- Line breaks are preserved as semantically significant
- Spacing and formatting are maintained more strictly
Usage Example
See Also
- Observation - The base type for OCR observations
- BoundingBox - Position and dimension information
- reconstructParagraphs - Main function that produces TextBlocks