Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/ragaeeb/kokokor/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Kokokor provides built-in support for right-to-left (RTL) text processing, essential for languages like Arabic, Hebrew, Farsi, and Urdu. The library handles coordinate transformation and normalization to ensure accurate text reconstruction.
RTL support is enabled by default (isRTL: true). For left-to-right languages, set isRTL: false in options.

RTL Coordinate System

The Challenge

OCR engines typically return coordinates in a left-to-right (LTR) coordinate system where:
  • Origin (0, 0) is at the top-left corner
  • X-axis increases rightward
  • Y-axis increases downward
For RTL text, this creates problems:
  • Text flows right-to-left, but coordinates are left-to-right
  • Logical reading order doesn’t match spatial order
  • Text alignment and centering calculations are inverted

The Solution: Coordinate Flipping

Kokokor transforms RTL coordinates by flipping the x-axis:
export const mapOcrResultToRTLObservations = (
  observations: Observation[],
  imageWidth: number
) => {
  return observations.map((o) => ({
    ...o,
    bbox: {
      ...o.bbox,
      x: imageWidth - o.bbox.x - o.bbox.width
    }
  }));
};
Reference: src/utils/normalization.ts:27

Transformation Formula

newX = imageWidth - originalX - textWidth
Before Transformation (LTR coordinates):
0                     imageWidth (800px)
|---------------------|------------------|
        Arabic text at x=100, width=50
After Transformation (RTL coordinates):
0                     imageWidth (800px)
|---------------------|------------------|
                 Arabic text at x=650
Calculation: 650 = 800 - 100 - 50

Processing Pipeline

Stage 1: Preprocessing

RTL transformation happens early in the pipeline during the flipAndAlignObservations step:
export const flipAndAlignObservations = (
  observations: Observation[],
  imageWidth: number,
  dpiX: number,
  options: Partial<Pick<MapObservationsToTextLinesOptions, 'isRTL' | 'log'>> = {}
) => {
  // 1. Filter noise
  observations = observations.filter(filterNoisyObservations);

  if (observations.length === 0) {
    return [];
  }

  // 2. Apply RTL coordinate flip
  if (options.isRTL) {
    observations = mapOcrResultToRTLObservations(observations, imageWidth);
  }

  // 3. Normalize x-coordinates for alignment
  return normalizeObservationsX(observations, dpiX);
};
Reference: src/utils/paragraphs.ts:43

Preprocessing Steps

1

Noise Filtering

Remove invalid or noisy observations
const filterNoisyObservations = (o: Observation) =>
  o.text?.replace(/[،,؛;؟?۔.:\-()]/g, '').length > 1;
Reference: src/utils/normalization.ts:54
2

RTL Coordinate Flip

Transform x-coordinates for RTL text flow
if (options.isRTL) {
  observations = mapOcrResultToRTLObservations(observations, imageWidth);
}
3

X-Coordinate Normalization

Align observations to clean left edge
return normalizeObservationsX(observations, dpiX);

Coordinate Normalization

After RTL flipping, coordinates are normalized to create clean alignment:
export const normalizeObservationsX = (
  observations: Observation[],
  dpi: number,
  standardDPI: number = 300
) => {
  const thresholdPx = (standardDPI / dpi) * 5;
  const minX = Math.min(...observations.map((o) => o.bbox.x));

  return observations.map((o) => {
    if (Math.abs(o.bbox.x - minX) <= thresholdPx) {
      return { ...o, bbox: { ...o.bbox, x: minX } };
    }
    return o;
  });
};
Reference: src/utils/normalization.ts:84

Why Normalize?

OCR engines may produce slightly inconsistent x-coordinates for aligned text:
Line 1: x=50.2
Line 2: x=50.8
Line 3: x=49.5
Normalization snaps these to a common baseline:
Line 1: x=49.5
Line 2: x=49.5
Line 3: x=49.5
Benefits:
  • Cleaner paragraph detection
  • Better indent recognition
  • Improved poetry centering

Arabic Text Example

Input (LTR Coordinates)

const observations = [
  {
    bbox: { x: 100, y: 50, width: 150, height: 20 },
    text: "السلام"
  },
  {
    bbox: { x: 260, y: 50, width: 100, height: 20 },
    text: "عليكم"
  }
];

const pageWidth = 800;

After RTL Flip

[
  {
    bbox: { x: 550, y: 50, width: 150, height: 20 },
    text: "السلام"  // 800 - 100 - 150 = 550
  },
  {
    bbox: { x: 440, y: 50, width: 100, height: 20 },
    text: "عليكم"   // 800 - 260 - 100 = 440
  }
]

Logical Order

Now the observations are in correct RTL reading order:
  • First word (rightmost): “السلام” at x=550
  • Second word (leftmost): “عليكم” at x=440

RTL Poetry Detection

RTL transformation ensures poetry detection works correctly:

Hemistich Example

// Arabic poetry hemistichs (before RTL flip)
const observations = [
  {
    bbox: { x: 100, y: 200, width: 220, height: 18 },
    text: "في البدء كانت الكلمة"
  },
  {
    bbox: { x: 480, y: 200, width: 210, height: 18 },
    text: "والكلمة عند الله"
  }
];
After RTL flip (imageWidth = 800):
[
  {
    bbox: { x: 480, y: 200, width: 220, height: 18 },
    text: "في البدء كانت الكلمة"  // 800 - 100 - 220 = 480
  },
  {
    bbox: { x: 110, y: 200, width: 210, height: 18 },
    text: "والكلمة عند الله"      // 800 - 480 - 210 = 110
  }
]
Combined bounding box:
  • Left edge: min(480, 110) = 110
  • Right edge: max(480+220, 110+210) = max(700, 320) = 700
  • Width: 700 - 110 = 590
  • Center: 110 + 590/2 = 405
  • Page center: 800/2 = 400
  • Difference: 5px (within tolerance) ✓

Configuration

Enable RTL Processing

import { reconstructParagraphs } from 'kokokor';

const result = reconstructParagraphs(
  { observations, page, layout },
  {
    line: {
      isRTL: true,  // Enable RTL coordinate flipping
    },
  }
);

Disable for LTR Languages

const result = reconstructParagraphs(
  { observations, page, layout },
  {
    line: {
      isRTL: false,  // Disable RTL flipping for English, etc.
    },
  }
);
RTL is enabled by default (isRTL: true). Explicitly set isRTL: false for left-to-right languages like English, French, or Spanish.

Mixed Text Handling

RTL Text with LTR Numbers

Arabic text often contains embedded Latin numerals:
"في عام 2024 ميلادية"
Kokokor handles this correctly because:
  1. OCR engines typically return observations in visual order (left to right on page)
  2. RTL coordinate flip maintains relative positions
  3. Text content remains unchanged (only coordinates flip)

Bidirectional Text

For documents with both RTL and LTR sections:
{
  line: {
    isRTL: true,  // Flip coordinates
  }
}

Supported RTL Languages

Arabic

Full support for Arabic script, poetry hemistichs, and diacritics

Hebrew

Hebrew text with proper coordinate transformation

Farsi/Persian

Persian poetry and prose with RTL layout

Urdu

Urdu text with Nastaliq script support

Common Patterns

Pattern 1: Arabic OCR Processing

import { reconstructParagraphs } from 'kokokor';

const arabicResult = reconstructParagraphs(
  {
    observations: ocrObservations,
    page: {
      width: 1700,
      height: 2200,
      dpiX: 300,
      dpiY: 300,
    },
    layout: {
      horizontalLines: [],  // Footnote separators
      rectangles: [],        // Heading boxes
    },
  },
  {
    line: {
      isRTL: true,
      poetryDetectionOptions: {
        // Arabic poetry often uses hemistichs
        pairWidthSimilarityRatio: 0.4,
        pairWordCountSimilarityRatio: 0.5,
      },
      poetryPairDelimiter: ' ',  // Join hemistichs with space
    },
  }
);

Pattern 2: Hebrew Processing

const hebrewResult = reconstructParagraphs(
  { observations, page, layout },
  {
    line: {
      isRTL: true,
      // Hebrew typically has less poetry formatting
      poetryDetectionOptions: {
        minWordCount: 3,  // Stricter poetry detection
      },
    },
  }
);

Pattern 3: Bilingual Documents

// Detect language per page and process accordingly
function processPage(observations, page, layout, isRTLPage) {
  return reconstructParagraphs(
    { observations, page, layout },
    {
      line: {
        isRTL: isRTLPage,
      },
    }
  );
}

// Process Arabic page
const arabicPage = processPage(arabicObs, page, layout, true);

// Process English page
const englishPage = processPage(englishObs, page, layout, false);

Debugging RTL Processing

Enable Logging

const result = reconstructParagraphs(
  { observations, page, layout },
  {
    line: {
      isRTL: true,
      log: (message, ...args) => {
        console.log(`[RTL] ${message}:`, args);
      },
    },
  }
);
Output:
[RTL] mapOcrResultToRTLObservations: [Array of observations]
[RTL] normalizeObservationsX: [Array after normalization]
[RTL] indexObservationsAsLines: [Grouping info]

Visual Inspection

Compare before and after coordinates:
function inspectRTLTransform(observations, imageWidth) {
  console.log('Before RTL flip:');
  observations.forEach(o => {
    console.log(`  x=${o.bbox.x}, text="${o.text}"`);
  });

  const transformed = mapOcrResultToRTLObservations(observations, imageWidth);

  console.log('\nAfter RTL flip:');
  transformed.forEach(o => {
    console.log(`  x=${o.bbox.x}, text="${o.text}"`);
  });
}

Performance Considerations

RTL coordinate transformation is O(n) where n is the number of observations. The overhead is minimal.
Operations:
  • Coordinate flip: Simple arithmetic per observation
  • Normalization: Single pass to find minimum, single pass to adjust
  • Total: ~2n operations

Edge Cases

Empty Documents

if (observations.length === 0) {
  return [];  // Early exit, no RTL processing needed
}
Reference: src/utils/paragraphs.ts:51

Single Observation

RTL flip still applies:
const single = [{ bbox: { x: 100, width: 50 }, text: "مرحبا" }];
// After flip (imageWidth=800): x = 650

Zero-Width Observations

Filtered out during noise removal:
const filterNoisyObservations = (o) => o.text?.length > 1;

Next Steps

Poetry Detection

How RTL affects poetry detection

Processing Pipeline

Where RTL transformation fits in the pipeline

Configuration

Complete RTL configuration options

Examples

Full Arabic OCR processing examples

Build docs developers (and LLMs) love