Detect and handle LLM translation errors with wobble-bibble’s validation system
Wobble-bibble includes a comprehensive validation system to catch common LLM translation errors like invented IDs, Arabic script leaks, malformed markers, and structural problems.
The validateTranslationResponse() function checks an LLM’s translation output against your source segments:
import { validateTranslationResponse } from 'wobble-bibble';const segments = [ { id: 'P1', text: 'هذا نص عربي طويل يحتوي على محتوى كافٍ للترجمة' }, { id: 'P2', text: 'نص عربي آخر للترجمة' }];const llmOutput = `P1 - This is a sufficiently long English translation.P2 - Another Arabic text for translation.`;const result = validateTranslationResponse(segments, llmOutput);if (result.errors.length > 0) { console.error('Validation errors found:', result.errors);} else { console.log('Translation is valid!');}
Validation only checks segments that appear in the LLM output. If your corpus has 100 segments but the LLM only translated 10, validation only checks those 10.
The validation result contains three key properties:
interface ValidationResponseResult { // IDs found in the response (in order) parsedIds: string[]; // Normalized version of the response (with formatting fixes) normalizedResponse: string; // Array of validation errors (empty if valid) errors: ValidationError[];}
Each error includes detailed information for debugging:
interface ValidationError { // Machine-readable error type type: ValidationErrorType; // Human-readable error message message: string; // Character range in the original response (end is exclusive) range: { start: number; end: number }; // The exact text that caused the error matchText: string; // Segment ID where error occurred (if applicable) id?: string; // Stable rule identifier for tooling ruleId?: string;}
Translation is suspiciously short compared to the Arabic source (heuristic check).
const longArabic = 'هو هذا الذي يسمونه بالمضاف المحذوف...';const segments = [{ id: 'P1', text: longArabic }];// ❌ Too short for long Arabicconst response = `P1 - Short.`;const result = validateTranslationResponse(segments, response);// result.errors[0].type === 'length_mismatch'
collapsed_speakers - Speaker labels mid-line
Speaker labels appear in the middle of a line instead of starting a new line.
const segments = [{ id: 'P1', text: 'نص' }];// ❌ Too many empty parentheses (threshold is 3)const bad = `P1 - One () two () three () four () five ().`;const result = validateTranslationResponse(segments, bad);// result.errors.length === 5
all_caps - Excessive ALL CAPS detected
Run of uppercase words detected (“shouting” text).
const segments = [{ id: 'P1', text: 'نص' }];// ❌ Too many ALL CAPS words in a rowconst bad = `P1 - THIS IS VERY VERY LOUD.`;const result = validateTranslationResponse(segments, bad);// result.errors[0].type === 'all_caps'// ✅ Acronyms are fineconst good = `P1 - The USA is fine.`;
newline_after_id - Formatting error
Newline appears immediately after “ID -” instead of the translation text.
const segments = [{ id: 'P1', text: 'نص' }];// ❌ Newline after markerconst bad = `P1 -\nText`;const result = validateTranslationResponse(segments, bad);// result.errors[0].type === 'newline_after_id'
multiword_translit_without_gloss - Missing English gloss
Multi-word transliteration phrase without parenthetical English explanation.
const segments = [{ id: 'P1', text: 'نص عربي' }];// ❌ No gloss for multi-word phraseconst bad = `P1 - He advised al-hajr fi al-madajīʿ.`;const result = validateTranslationResponse(segments, bad);// result.errors[0].type === 'multiword_translit_without_gloss'// ✅ Has glossconst good = `P1 - He advised al-hajr fi al-madajīʿ (marital bed abandonment).`;
You can customize validation behavior with the config option:
const result = validateTranslationResponse(segments, response, { config: { // Require at least 4 consecutive ALL CAPS words to flag (default: 5) allCapsWordRunThreshold: 4 }});
Use VALIDATION_ERROR_TYPE_INFO to get human-readable descriptions:
import { VALIDATION_ERROR_TYPE_INFO } from 'wobble-bibble';// Get description for a specific error typeconst description = VALIDATION_ERROR_TYPE_INFO.arabic_leak.description;// "Arabic script was detected in output (except ﷺ)."// Print all error types and descriptionsfor (const [type, info] of Object.entries(VALIDATION_ERROR_TYPE_INFO)) { console.log(`${type}: ${info.description}`);}