Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/ragaeeb/kokokor/llms.txt

Use this file to discover all available pages before exploring further.

Function Signature

export const mapTextLinesToParagraphs = (
  textLines: TextBlock[],
  options: ParagraphOptions = {}
): TextBlock[]
Location: src/utils/paragraphs.ts:236

Description

Groups text lines into coherent paragraphs while handling both prose and poetry content appropriately. This is the second stage of the paragraph reconstruction pipeline. The function:
  • Merges consecutive prose lines into paragraphs based on vertical spacing and line width patterns
  • Preserves poetic lines individually to maintain their formatting
  • Processes body content and footnotes separately
  • Uses enhanced paragraph detection with robust geometry heuristics

Parameters

textLines
TextBlock[]
required
Array of text lines to group into paragraphs. These are typically the output from mapObservationsToTextLines.Each TextBlock should contain:
  • text: The line content
  • bbox: Bounding box of the line
  • isPoetic: Whether the line is poetry (will not be merged)
  • isFootnote: Whether the line is a footnote (processed separately)
options
ParagraphOptions
default:"{}"
Object-based paragraph detection settings.

Returns

return
TextBlock[]
Array of text blocks representing complete paragraphs.
  • Prose lines are merged into paragraph-level blocks
  • Poetic lines (isPoetic: true) are preserved individually
  • Each block contains merged text and a bounding box covering the entire paragraph
  • Body content and footnotes are processed separately then concatenated

Example

import { mapTextLinesToParagraphs } from 'kokokor';

const textLines = [
  {
    text: 'This is the first line.',
    bbox: { x: 50, y: 100, width: 200, height: 15 },
    isPoetic: false,
    isFootnote: false
  },
  {
    text: 'This is the second line.',
    bbox: { x: 50, y: 120, width: 210, height: 15 },
    isPoetic: false,
    isFootnote: false
  },
  {
    text: 'Short line.',
    bbox: { x: 50, y: 140, width: 100, height: 15 },
    isPoetic: false,
    isFootnote: false
  },
  {
    text: 'New paragraph here.',
    bbox: { x: 50, y: 180, width: 190, height: 15 },
    isPoetic: false,
    isFootnote: false
  }
];

const paragraphs = mapTextLinesToParagraphs(textLines, {
  verticalJumpFactor: 2,
  widthTolerance: 0.85
});

console.log(paragraphs);
// [
//   {
//     text: 'This is the first line. This is the second line. Short line.',
//     bbox: { x: 50, y: 100, width: 210, height: 55 },
//     isPoetic: false,
//     isFootnote: false
//   },
//   {
//     text: 'New paragraph here.',
//     bbox: { x: 50, y: 180, width: 190, height: 15 },
//     isPoetic: false,
//     isFootnote: false
//   }
// ]

Example with Poetry

const mixedContent = [
  {
    text: 'This is a prose paragraph.',
    bbox: { x: 50, y: 100, width: 200, height: 15 },
    isPoetic: false
  },
  {
    text: 'Roses are red,',
    bbox: { x: 100, y: 150, width: 120, height: 15 },
    isPoetic: true
  },
  {
    text: 'Violets are blue.',
    bbox: { x: 100, y: 170, width: 130, height: 15 },
    isPoetic: true
  },
  {
    text: 'Another prose line here.',
    bbox: { x: 50, y: 220, width: 195, height: 15 },
    isPoetic: false
  }
];

const result = mapTextLinesToParagraphs(mixedContent);

// Poetry lines are preserved separately:
// [
//   { text: 'This is a prose paragraph.', ... },
//   { text: 'Roses are red,', isPoetic: true, ... },
//   { text: 'Violets are blue.', isPoetic: true, ... },
//   { text: 'Another prose line here.', ... }
// ]

Notes

  • The enhanced paragraph detector uses robust geometry (p75 width baseline, robust x baseline)
  • Poetic lines are never merged - they preserve their original line breaks
  • Body content and footnotes are processed separately using the same heuristics
  • The function implements a single-break-per-line decision to avoid double increments
  • Short-line interaction guards prevent premature paragraph breaks

Build docs developers (and LLMs) love