Kokokor uses sophisticated heuristics to group text lines into paragraphs. Two key parameters control this behavior: verticalJumpFactor and widthTolerance. Understanding how these work helps you tune paragraph detection for different document types.
The verticalJumpFactor determines when a vertical gap between lines is large enough to indicate a new paragraph. It works by comparing consecutive gaps:
// A new paragraph starts when:// currentGap > previousGap * verticalJumpFactor
// verticalJumpFactor: 1.5// Even small spacing increases create new paragraphsconst result = reconstructParagraphs(input, { paragraph: { verticalJumpFactor: 1.5 }});// Example: gap of 40px after 30px → new paragraph// 40 > 30 * 1.5 (45)? No// But gap of 50px after 30px → new paragraph// 50 > 30 * 1.5 (45)? Yes
The vertical jump signal only activates when the preceding lines are full-width (not short lines). This prevents false breaks after natural line endings.
// This will NOT trigger a vertical break:This is a long line that ends the paragraph.Short line. ← Short line (big gap)Next paragraph. ← Gap is ignored due to short line above// The short line already signals the paragraph break,// so vertical jump detection is suppressed
The widthTolerance determines what constitutes a “short line” that indicates a paragraph ending. Lines narrower than this threshold trigger a new paragraph for the following line.
// Example document:This is a long line of text that continues. (width: 420)This is another long line in same paragraph. (width: 415)Short line. (width: 300) (gap: 50px)This starts a new paragraph with more text. (width: 410)And this continues that paragraph. (width: 405) (gap: 80px, previous gap: 25px)This is another paragraph after big gap. (width: 418)// With defaults (verticalJumpFactor: 2.0, widthTolerance: 0.85):// Reference width (p75): ~415// Threshold width: 415 * 0.85 = 352.75// Paragraph 1: Lines 1-2// Line 3 is short (300 < 352.75) → triggers break// Paragraph 2: Lines 4-5// Gap of 80px vs previous 25px// 80 > 25 * 2.0? Yes → triggers break// Paragraph 3: Line 6
Academic papers often have consistent spacing and few short lines:
const result = reconstructParagraphs(input, { paragraph: { verticalJumpFactor: 1.8, // Sensitive to spacing widthTolerance: 0.90 // Only very short lines }});
Technical docs may have lists, code blocks, and varied formatting:
const result = reconstructParagraphs(input, { paragraph: { verticalJumpFactor: 2.5, // Less sensitive widthTolerance: 0.75 // More short line breaks }});