extractTranslationIds

Overview

Extracts all translation marker IDs from normalized text, preserving their order of appearance. This is useful for validating response structure and detecting missing or duplicate IDs.

Function Signature

extractTranslationIds(text: string): string[]

Parameters

text

string

required

Translation text containing markers in the format “ID - Translation”. Should be normalized using normalizeTranslationText first for best results.

Returns

ids

string[]

Array of extracted IDs in the order they appear in the text.

Usage

Basic Example

import { extractTranslationIds } from 'wobble-bibble';

const response = `P1 - First translation
P2 - Second translation
P3 - Third translation`;

const ids = extractTranslationIds(response);
console.log(ids); // ['P1', 'P2', 'P3']

Detecting Duplicates

const response = `P1 - First
P2 - Second
P1 - Duplicate`;

const ids = extractTranslationIds(response);
console.log(ids); // ['P1', 'P2', 'P1']

// Check for duplicates
const hasDuplicates = ids.length !== new Set(ids).size;
console.log(hasDuplicates); // true

Validating Expected IDs

const expected = ['P1', 'P2', 'P3'];
const response = `P1 - Text
P3 - More text`;

const actual = extractTranslationIds(response);
const missing = expected.filter(id => !actual.includes(id));

console.log(missing); // ['P2']

Handling Complex ID Formats

// Works with various ID formats
const response = `P1 - Text
P2b - More text
P123 - Even more`;

const ids = extractTranslationIds(response);
console.log(ids); // ['P1', 'P2b', 'P123']

ID Format Requirements

The function recognizes IDs that match the pattern defined by MARKER_ID_PATTERN. Valid IDs typically:

Start with a letter (often ‘P’ for paragraph)
May contain additional letters or numbers
Are followed by optional space and dashes (- or - or - or -)

Examples of valid markers:

P1 - text
P2b- text
P123 -text
H5 - text (for Hadith segments)

When to Use

Use extractTranslationIds when:

Validating that all expected segments were translated
Checking for duplicate ID entries in responses
Verifying correct ID ordering in sequential translations
Building validation pipelines for LLM outputs
Debugging translation response structure issues

This function is a key component of the validation system used to detect LLM hallucinations and output errors.

Best Practices

Always normalize first: Use normalizeTranslationText before calling this function to ensure reliable ID extraction

const normalized = normalizeTranslationText(rawResponse);
const ids = extractTranslationIds(normalized);

Compare with expected: Always validate extracted IDs against your input segments

const expectedIds = segments.map(s => s.id);
const actualIds = extractTranslationIds(response);
const allPresent = expectedIds.every(id => actualIds.includes(id));

Check for order: For sequential translations, verify IDs appear in the correct order
```
const isOrdered = JSON.stringify(actualIds) === JSON.stringify(expectedIds);
```

normalizeTranslationText - Prepare text before ID extraction
parseTranslations - Extract both IDs and translations
parseTranslationsInOrder - Get ordered ID-translation pairs

Prompts

Validation

Text Utils

Fixers

Constants

Types

Overview

Function Signature

Parameters

Returns

Usage

Basic Example

Detecting Duplicates

Validating Expected IDs

Handling Complex ID Formats

ID Format Requirements

When to Use

Best Practices

Build docs developers (and LLMs) love

Prompts

Validation

Text Utils

Fixers

Constants

Types

​Overview

​Function Signature

​Parameters

​Returns

​Usage

​Basic Example

​Detecting Duplicates

​Validating Expected IDs

​Handling Complex ID Formats

​ID Format Requirements

​When to Use

​Best Practices

​Related Functions

Build docs developers (and LLMs) love

Overview

Function Signature

Parameters

Returns

Usage

Basic Example

Detecting Duplicates

Validating Expected IDs

Handling Complex ID Formats

ID Format Requirements

When to Use

Best Practices

Related Functions