Overview
Extracts all translation marker IDs from normalized text, preserving their order of appearance. This is useful for validating response structure and detecting missing or duplicate IDs.Function Signature
Parameters
Translation text containing markers in the format “ID - Translation”. Should be normalized using
normalizeTranslationText first for best results.Returns
Array of extracted IDs in the order they appear in the text.
Usage
Basic Example
Detecting Duplicates
Validating Expected IDs
Handling Complex ID Formats
ID Format Requirements
The function recognizes IDs that match the pattern defined byMARKER_ID_PATTERN. Valid IDs typically:
- Start with a letter (often ‘P’ for paragraph)
- May contain additional letters or numbers
- Are followed by optional space and dashes (
-or-or-or-)
P1 - textP2b- textP123 -textH5 - text(for Hadith segments)
When to Use
UseextractTranslationIds when:
- Validating that all expected segments were translated
- Checking for duplicate ID entries in responses
- Verifying correct ID ordering in sequential translations
- Building validation pipelines for LLM outputs
- Debugging translation response structure issues
Best Practices
-
Always normalize first: Use
normalizeTranslationTextbefore calling this function to ensure reliable ID extraction -
Compare with expected: Always validate extracted IDs against your input segments
-
Check for order: For sequential translations, verify IDs appear in the correct order
Related Functions
normalizeTranslationText- Prepare text before ID extractionparseTranslations- Extract both IDs and translationsparseTranslationsInOrder- Get ordered ID-translation pairs