Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ragaeeb/kokokor/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Kokokor uses sophisticated heuristics to detect and preserve poetry formatting. Poetry is kept as separate lines rather than being merged into paragraphs, maintaining the artistic structure of verse.Poetry Detection Methods
Kokokor identifies poetry using three complementary heuristics:Poetry Pairs (Hemistichs)
Poetry Pairs (Hemistichs)
Two lines with similar width and word count that are centered as a unit. Common in Arabic and classical poetry.Detection criteria:
- Similar widths (within 40% difference by default)
- Similar word counts (within 50% difference)
- Centered when considered together
- Compatible vertical spacing
Wide Poetic Lines
Wide Poetic Lines
Single lines that are centered with lower word density than prose.Detection criteria:
- Centered on the page
- Width ≥ 60% of page width
- Word density < 80% of average prose density
- At least 2 words (configurable)
Centering Analysis
Centering Analysis
Both methods use centering detection with configurable tolerances.Parameters:
centerToleranceRatio: How close to center (default: 5% of page width)minMarginRatio: Minimum whitespace on each side (default: 10%)
Basic Poetry Example
Arabic Poetry (Hemistichs)
Arabic poetry often uses hemistichs - two balanced parts of a verse:Mixed Prose and Poetry Document
Configuration Options
Minimum number of words for a line to be considered poetry. Filters out noise like page numbers.
How close to center a line must be (as ratio of page width). 0.05 = within 5% of page width from true center.
Minimum whitespace required on each side (as ratio of page width). 0.1 = 10% margin on each side.
For wide poetry: maximum word density as ratio of prose density. 0.8 = poetry must have ≤80% of prose density.
Minimum width for wide poetry lines (as ratio of page width). Set to
null to disable wide poetry detection.For hemistichs: maximum width difference (as ratio of average width). 0.4 = widths can differ by up to 40%.
For hemistichs: maximum word count difference (as ratio of max count). 0.5 = counts can differ by up to 50%.
For hemistichs: maximum vertical gap (as ratio of average height). 2.0 = gap can be up to 200% of line height.
Delimiter used when merging hemistichs. Use
' ... ' for traditional Arabic poetry formatting.How Poetry Detection Works
Calculate Prose Baseline
Kokokor analyzes the entire document to calculate average word density for prose content. This serves as a baseline for comparison.
Check Wide Lines
For single lines that are wide enough (≥60% of page width by default), Kokokor checks:
- Is it centered with sufficient margins?
- Is word density lower than prose baseline?
- Does it have enough words (not just fragments)?
Check Pairs
For pairs of observations on the same line, Kokokor checks:
- Do they have similar widths?
- Do they have similar word counts?
- Are they centered when combined?
- Is the vertical gap appropriate?
Best Practices
Prose Punctuation: Kokokor automatically filters lines containing parentheses, commas, or semicolons from wide poetry detection, as these are more common in prose.
See Also
Arabic Text
RTL text processing and Arabic hemistichs
Multi-column
Complex layouts with headings and footnotes