Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ragaeeb/kokokor/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Kokokor can process documents with complex layouts by using layout elements like rectangles and horizontal lines. These elements help identify structural components like headings and footnotes.Layout Elements
Rectangles
Used to identify headings and boxed content. Text within rectangles is marked with
isHeading: true.Horizontal Lines
Used to detect footnotes. Text below the last horizontal line (outside rectangles) is marked with
isFootnote: true.Basic Layout Example
Multi-Column Document
Kokokor’s current paragraph detection works best with single-column layouts. For multi-column documents, pre-process observations to group by column, or process each column separately.
Complete Example with All Features
Utility Functions
filterHorizontalLinesOutsideRectangles
Filters horizontal lines that are contained within rectangles:calculateDPI
Calculates DPI from image and PDF dimensions:mapMatrixToBoundingBox
Converts array-format bounding boxes to object format:Configuration Options
Array of rectangle coordinates for heading detection. Text within rectangles is marked with
isHeading: true.Array of horizontal line coordinates for footnote detection. Text below the last line (outside rectangles) is marked with
isFootnote: true.Optional symbol to insert before the first footnote. Common values:
'___', '---', '***'.Tolerance in pixels (at 72 DPI) for determining if a line is inside a rectangle.
Best Practices
Heading Detection: Text is only marked as heading if it’s contained within a rectangle. Use your OCR system’s layout analysis to identify heading boxes.
Advanced: Custom Layout Processing
For complex layouts, use the low-level API:See Also
Simple OCR
Basic paragraph reconstruction
Poetry Documents
Preserve poetic formatting
API Reference
Complete API documentation
Arabic Text
RTL text processing