Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ragaeeb/kokokor/llms.txt
Use this file to discover all available pages before exploring further.
Function Signature
src/utils/paragraphs.ts:82
Description
Converts raw OCR observations into structured text lines with rich metadata. This is the first stage of the paragraph reconstruction pipeline. The function performs several operations:- Groups observations into lines based on vertical proximity
- Detects centered text (titles, poetry)
- Identifies headings (text within rectangles)
- Detects footnotes (text below horizontal lines)
- Performs poetry detection to preserve poetic formatting
Parameters
Array of OCR text observations to process. Each observation contains:
text: The recognized text stringbbox: Bounding box with position (x, y) and dimensions (width, height)
Page dimensions and DPI information:
width: Page width in pixelsheight: Page height in pixelsdpiX: Horizontal DPI for coordinate normalizationdpiY: Vertical DPI for line grouping
Configuration options for text line processing.
Returns
Array of text blocks with metadata. Each TextBlock contains:
text: The merged text content of the linebbox: Bounding box covering the entire lineisCentered: Whether the line is centered on the pageisHeading: Whether the line is within a rectangle (heading)isFootnote: Whether the line appears below a horizontal lineisPoetic: Whether the line is identified as poetry/verse
Example
Notes
- Observations are preprocessed by filtering noise and normalizing coordinates
- For RTL text, set
isRTL: trueto flip x-coordinates appropriately - Poetry detection uses multiple heuristics: paired hemistichs, word density, and centering
- Lines marked as
isPoeticwill not be merged into paragraphs bymapTextLinesToParagraphs - The function automatically calculates adaptive line height if
lineHeightFactoris not provided