Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ragaeeb/kokokor/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Kokokor provides a simple, one-shot API calledreconstructParagraphs that handles the complete pipeline of converting OCR observations into properly formatted text with intelligent paragraph breaks.
Quick Start
The recommended way to use Kokokor is through thereconstructParagraphs function:
Input Format
ThereconstructParagraphs function expects two main inputs:
Observations
An array of OCR text observations, where each observation contains:The recognized text content
The bounding box defining position and dimensions
Page Context
Page metadata required for accurate reconstruction:Page width in pixels
Page height in pixels
Horizontal DPI (dots per inch)
Vertical DPI (dots per inch)
Output Structure
ThereconstructParagraphs function returns a comprehensive result object:
Lines
An array ofTextBlock objects representing individual text lines with metadata:
text- The line contentbbox- Position and dimensionsisCentered- Whether the line is centeredisHeading- Whether it’s identified as a headingisFootnote- Whether it’s identified as a footnoteisPoetic- Whether it’s identified as poetry
Paragraphs
An array ofTextBlock objects where consecutive prose lines have been merged into paragraphs. Poetry lines remain separate to preserve formatting.
Text
The final formatted text output as a string, with:- Proper paragraph breaks (double newlines)
- Poetry lines preserved individually
- Headings followed by blank lines
- Optional footer symbols before footnotes
Complete Example
Here’s a complete example from OCR output to formatted text:Next Steps
Advanced Configuration
Fine-tune line detection, paragraph grouping, and poetry detection
Surya Integration
Convert Surya OCR output to Kokokor format