RTL Text Support

Overview

Kokokor provides built-in support for right-to-left (RTL) text processing, essential for languages like Arabic, Hebrew, Farsi, and Urdu. The library handles coordinate transformation and normalization to ensure accurate text reconstruction.

RTL support is enabled by default (isRTL: true). For left-to-right languages, set isRTL: false in options.

RTL Coordinate System

The Challenge

OCR engines typically return coordinates in a left-to-right (LTR) coordinate system where:

Origin (0, 0) is at the top-left corner
X-axis increases rightward
Y-axis increases downward

For RTL text, this creates problems:

Text flows right-to-left, but coordinates are left-to-right
Logical reading order doesn’t match spatial order
Text alignment and centering calculations are inverted

The Solution: Coordinate Flipping

Kokokor transforms RTL coordinates by flipping the x-axis:

export const mapOcrResultToRTLObservations = (
  observations: Observation[],
  imageWidth: number
) => {
  return observations.map((o) => ({
    ...o,
    bbox: {
      ...o.bbox,
      x: imageWidth - o.bbox.x - o.bbox.width
    }
  }));
};

Reference: src/utils/normalization.ts:27

Transformation Formula

newX = imageWidth - originalX - textWidth

Visual Example

Before Transformation (LTR coordinates):

0                     imageWidth (800px)
|---------------------|------------------|
        Arabic text at x=100, width=50

After Transformation (RTL coordinates):

0                     imageWidth (800px)
|---------------------|------------------|
                 Arabic text at x=650

Calculation: 650 = 800 - 100 - 50

Processing Pipeline

Stage 1: Preprocessing

RTL transformation happens early in the pipeline during the flipAndAlignObservations step:

export const flipAndAlignObservations = (
  observations: Observation[],
  imageWidth: number,
  dpiX: number,
  options: Partial<Pick<MapObservationsToTextLinesOptions, 'isRTL' | 'log'>> = {}
) => {
  // 1. Filter noise
  observations = observations.filter(filterNoisyObservations);

  if (observations.length === 0) {
    return [];
  }

  // 2. Apply RTL coordinate flip
  if (options.isRTL) {
    observations = mapOcrResultToRTLObservations(observations, imageWidth);
  }

  // 3. Normalize x-coordinates for alignment
  return normalizeObservationsX(observations, dpiX);
};

Reference: src/utils/paragraphs.ts:43

Preprocessing Steps

Noise Filtering

Remove invalid or noisy observations

const filterNoisyObservations = (o: Observation) =>
  o.text?.replace(/[،,؛;؟?۔.:\-()]/g, '').length > 1;

Reference: src/utils/normalization.ts:54

RTL Coordinate Flip

Transform x-coordinates for RTL text flow

if (options.isRTL) {
  observations = mapOcrResultToRTLObservations(observations, imageWidth);
}

X-Coordinate Normalization

Align observations to clean left edge

return normalizeObservationsX(observations, dpiX);

Coordinate Normalization

After RTL flipping, coordinates are normalized to create clean alignment:

export const normalizeObservationsX = (
  observations: Observation[],
  dpi: number,
  standardDPI: number = 300
) => {
  const thresholdPx = (standardDPI / dpi) * 5;
  const minX = Math.min(...observations.map((o) => o.bbox.x));

  return observations.map((o) => {
    if (Math.abs(o.bbox.x - minX) <= thresholdPx) {
      return { ...o, bbox: { ...o.bbox, x: minX } };
    }
    return o;
  });
};

Reference: src/utils/normalization.ts:84

Why Normalize?

OCR engines may produce slightly inconsistent x-coordinates for aligned text:

Line 1: x=50.2
Line 2: x=50.8
Line 3: x=49.5

Normalization snaps these to a common baseline:

Line 1: x=49.5
Line 2: x=49.5
Line 3: x=49.5

Benefits:

Cleaner paragraph detection
Better indent recognition
Improved poetry centering

Arabic Text Example

Input (LTR Coordinates)

const observations = [
  {
    bbox: { x: 100, y: 50, width: 150, height: 20 },
    text: "السلام"
  },
  {
    bbox: { x: 260, y: 50, width: 100, height: 20 },
    text: "عليكم"
  }
];

const pageWidth = 800;

After RTL Flip

[
  {
    bbox: { x: 550, y: 50, width: 150, height: 20 },
    text: "السلام"  // 800 - 100 - 150 = 550
  },
  {
    bbox: { x: 440, y: 50, width: 100, height: 20 },
    text: "عليكم"   // 800 - 260 - 100 = 440
  }
]

Logical Order

Now the observations are in correct RTL reading order:

First word (rightmost): “السلام” at x=550
Second word (leftmost): “عليكم” at x=440

RTL Poetry Detection

RTL transformation ensures poetry detection works correctly:

Hemistich Example

// Arabic poetry hemistichs (before RTL flip)
const observations = [
  {
    bbox: { x: 100, y: 200, width: 220, height: 18 },
    text: "في البدء كانت الكلمة"
  },
  {
    bbox: { x: 480, y: 200, width: 210, height: 18 },
    text: "والكلمة عند الله"
  }
];

After RTL flip (imageWidth = 800):

[
  {
    bbox: { x: 480, y: 200, width: 220, height: 18 },
    text: "في البدء كانت الكلمة"  // 800 - 100 - 220 = 480
  },
  {
    bbox: { x: 110, y: 200, width: 210, height: 18 },
    text: "والكلمة عند الله"      // 800 - 480 - 210 = 110
  }
]

Combined bounding box:

Left edge: min(480, 110) = 110
Right edge: max(480+220, 110+210) = max(700, 320) = 700
Width: 700 - 110 = 590
Center: 110 + 590/2 = 405
Page center: 800/2 = 400
Difference: 5px (within tolerance) ✓

Configuration

Enable RTL Processing

import { reconstructParagraphs } from 'kokokor';

const result = reconstructParagraphs(
  { observations, page, layout },
  {
    line: {
      isRTL: true,  // Enable RTL coordinate flipping
    },
  }
);

Disable for LTR Languages

const result = reconstructParagraphs(
  { observations, page, layout },
  {
    line: {
      isRTL: false,  // Disable RTL flipping for English, etc.
    },
  }
);

RTL is enabled by default (isRTL: true). Explicitly set isRTL: false for left-to-right languages like English, French, or Spanish.

Mixed Text Handling

RTL Text with LTR Numbers

Arabic text often contains embedded Latin numerals:

"في عام 2024 ميلادية"

Kokokor handles this correctly because:

OCR engines typically return observations in visual order (left to right on page)
RTL coordinate flip maintains relative positions
Text content remains unchanged (only coordinates flip)

Bidirectional Text

For documents with both RTL and LTR sections:

Predominantly RTL
Predominantly LTR
Mixed Document

{
  line: {
    isRTL: true,  // Flip coordinates
  }
}

{
  line: {
    isRTL: false,  // Don't flip coordinates
  }
}

Process RTL and LTR pages separately:

// RTL pages
const rtlResult = reconstructParagraphs(
  { observations: rtlObservations, page, layout },
  { line: { isRTL: true } }
);

// LTR pages
const ltrResult = reconstructParagraphs(
  { observations: ltrObservations, page, layout },
  { line: { isRTL: false } }
);

Supported RTL Languages

Arabic

Full support for Arabic script, poetry hemistichs, and diacritics

Hebrew

Hebrew text with proper coordinate transformation

Farsi/Persian

Persian poetry and prose with RTL layout

Urdu

Urdu text with Nastaliq script support

Common Patterns

Pattern 1: Arabic OCR Processing

import { reconstructParagraphs } from 'kokokor';

const arabicResult = reconstructParagraphs(
  {
    observations: ocrObservations,
    page: {
      width: 1700,
      height: 2200,
      dpiX: 300,
      dpiY: 300,
    },
    layout: {
      horizontalLines: [],  // Footnote separators
      rectangles: [],        // Heading boxes
    },
  },
  {
    line: {
      isRTL: true,
      poetryDetectionOptions: {
        // Arabic poetry often uses hemistichs
        pairWidthSimilarityRatio: 0.4,
        pairWordCountSimilarityRatio: 0.5,
      },
      poetryPairDelimiter: ' ',  // Join hemistichs with space
    },
  }
);

Pattern 2: Hebrew Processing

const hebrewResult = reconstructParagraphs(
  { observations, page, layout },
  {
    line: {
      isRTL: true,
      // Hebrew typically has less poetry formatting
      poetryDetectionOptions: {
        minWordCount: 3,  // Stricter poetry detection
      },
    },
  }
);

Pattern 3: Bilingual Documents

// Detect language per page and process accordingly
function processPage(observations, page, layout, isRTLPage) {
  return reconstructParagraphs(
    { observations, page, layout },
    {
      line: {
        isRTL: isRTLPage,
      },
    }
  );
}

// Process Arabic page
const arabicPage = processPage(arabicObs, page, layout, true);

// Process English page
const englishPage = processPage(englishObs, page, layout, false);

Debugging RTL Processing

Enable Logging

const result = reconstructParagraphs(
  { observations, page, layout },
  {
    line: {
      isRTL: true,
      log: (message, ...args) => {
        console.log(`[RTL] ${message}:`, args);
      },
    },
  }
);

Output:

[RTL] mapOcrResultToRTLObservations: [Array of observations]
[RTL] normalizeObservationsX: [Array after normalization]
[RTL] indexObservationsAsLines: [Grouping info]

Visual Inspection

Compare before and after coordinates:

function inspectRTLTransform(observations, imageWidth) {
  console.log('Before RTL flip:');
  observations.forEach(o => {
    console.log(`  x=${o.bbox.x}, text="${o.text}"`);
  });

  const transformed = mapOcrResultToRTLObservations(observations, imageWidth);

  console.log('\nAfter RTL flip:');
  transformed.forEach(o => {
    console.log(`  x=${o.bbox.x}, text="${o.text}"`);
  });
}

Performance Considerations

RTL coordinate transformation is O(n) where n is the number of observations. The overhead is minimal.

Operations:

Coordinate flip: Simple arithmetic per observation
Normalization: Single pass to find minimum, single pass to adjust
Total: ~2n operations

Edge Cases

Empty Documents

if (observations.length === 0) {
  return [];  // Early exit, no RTL processing needed
}

Reference: src/utils/paragraphs.ts:51

Single Observation

RTL flip still applies:

const single = [{ bbox: { x: 100, width: 50 }, text: "مرحبا" }];
// After flip (imageWidth=800): x = 650

Zero-Width Observations

Filtered out during noise removal:

const filterNoisyObservations = (o) => o.text?.length > 1;

Next Steps

Poetry Detection

How RTL affects poetry detection

Processing Pipeline

Where RTL transformation fits in the pipeline

Configuration

Complete RTL configuration options

Examples

Full Arabic OCR processing examples

Getting Started

Core Concepts

Guides

Examples

Documentation Index

​Overview

​RTL Coordinate System

​The Challenge

​The Solution: Coordinate Flipping

​Transformation Formula

​Processing Pipeline

​Stage 1: Preprocessing

​Preprocessing Steps

​Coordinate Normalization

​Why Normalize?

​Arabic Text Example

​Input (LTR Coordinates)

​After RTL Flip

​Logical Order

​RTL Poetry Detection

​Hemistich Example

​Configuration

​Enable RTL Processing

​Disable for LTR Languages

​Mixed Text Handling

​RTL Text with LTR Numbers

​Bidirectional Text

​Supported RTL Languages

Arabic

Hebrew

Farsi/Persian

Urdu

​Common Patterns

​Pattern 1: Arabic OCR Processing

​Pattern 2: Hebrew Processing

​Pattern 3: Bilingual Documents

​Debugging RTL Processing

​Enable Logging

​Visual Inspection

​Performance Considerations

​Edge Cases

​Empty Documents

​Single Observation

​Zero-Width Observations

​Next Steps

Poetry Detection

Processing Pipeline

Configuration

Examples

Build docs developers (and LLMs) love

Overview

RTL Coordinate System

The Challenge

The Solution: Coordinate Flipping

Transformation Formula

Processing Pipeline

Stage 1: Preprocessing

Preprocessing Steps

Coordinate Normalization

Why Normalize?

Arabic Text Example

Input (LTR Coordinates)

After RTL Flip

Logical Order

RTL Poetry Detection

Hemistich Example

Configuration

Enable RTL Processing

Disable for LTR Languages

Mixed Text Handling

RTL Text with LTR Numbers

Bidirectional Text

Supported RTL Languages

Common Patterns

Pattern 1: Arabic OCR Processing

Pattern 2: Hebrew Processing

Pattern 3: Bilingual Documents

Debugging RTL Processing

Enable Logging

Visual Inspection

Performance Considerations

Edge Cases

Empty Documents

Single Observation

Zero-Width Observations

Next Steps