Skip to main content
Wobble-bibble validates LLM translation output against a set of strict rules to catch hallucinations, formatting errors, and Arabic script leaks.

Overview

The validation system checks translations for:
  • ID integrity: Invented IDs, duplicates, missing segments
  • Format compliance: Marker syntax, line breaks, structure
  • Script leaks: Arabic characters in output
  • Translation quality: Truncation, length mismatches, empty content
import { validateTranslationResponse } from 'wobble-bibble';

const segments = [
    { id: 'P1', text: 'نص عربي طويل...' },
    { id: 'P2', text: 'نص آخر...' }
];

const response = 'P1 - A complete translation.\nP2 - Another translation.';

const result = validateTranslationResponse(segments, response);

if (result.errors.length > 0) {
    console.error('Validation failed:', result.errors);
}

Validation error types

Wobble-bibble defines 13 error types. Each has a stable ID and human-readable description:
// From src/types.ts:15-28
export type ValidationErrorType =
    | 'invalid_marker_format'
    | 'no_valid_markers'
    | 'newline_after_id'
    | 'duplicate_id'
    | 'invented_id'
    | 'missing_id_gap'
    | 'collapsed_speakers'
    | 'truncated_segment'
    | 'arabic_leak'
    | 'empty_parentheses'
    | 'length_mismatch'
    | 'all_caps'
    | 'multiword_translit_without_gloss';

ID integrity errors

These catch LLM hallucinations where segment IDs are invented, duplicated, or skipped.
Description: The response contains a segment ID that doesn’t exist in the source corpus.
// From src/validation.ts:423-438
const validateInventedIds = (context: ValidationContext): ValidationError[] => {
    const errors: ValidationError[] = [];
    for (const marker of context.markers) {
        if (!context.segmentById.has(marker.id)) {
            errors.push(
                makeErrorFromRawRange(
                    'invented_id',
                    `Invented ID detected: "${marker.id}" - this ID does not exist in the source`,
                    marker.headerText,
                    { end: marker.rawEnd, start: marker.rawStart },
                    marker.id,
                ),
            );
        }
    }
    return errors;
};
Example failure:
Source: P1, P2
Output: P1 - Text\nP3 - Invented!
Error: invented_id on "P3"

Format errors

These catch malformed markers and structural issues.
Description: A segment marker line is malformed (wrong ID shape or missing content after dash).Validates against the marker pattern:
// From src/constants.ts:36-38
export const MARKER_ID_PATTERN = 
    `${TRANSLATION_MARKER_PARTS.markers}${TRANSLATION_MARKER_PARTS.digits}${TRANSLATION_MARKER_PARTS.suffix}?`;
// Matches: P1234, B45a, T678, etc.
Example failures:
P1234$ - Invalid ($ character)
P1234a - Valid
P1234 - (empty after dash)

Content errors

These catch translation quality issues and script leaks.
Description: Arabic script detected in output (except ﷺ).Uses Unicode ranges for Arabic detection:
// From src/validation.ts:560
const arabicPattern = /[\u0600-\u06FF\u0750-\u077F\uFB50-\uFDF9\uFDFB-\uFDFF\uFE70-\uFEFF]+/g;
Covers:
  • \u0600-\u06FF - Arabic block
  • \u0750-\u077F - Arabic Supplement
  • \uFB50-\uFDFF - Arabic Presentation Forms
  • \uFE70-\uFEFF - Arabic Presentation Forms-B
The honorific ﷺ is the only allowed Arabic character. All other Arabic script is forbidden, even in quotes or parentheses.

Validation result

The validator returns a structured result:
// From src/types.ts:115-117
export type ValidationResponseResult = { 
    normalizedResponse: string;  // Cleaned response text
    parsedIds: string[];         // IDs found in response
    errors: ValidationError[]    // All validation errors
};
Each error includes:
// From src/types.ts:63-73
export type ValidationError = {
    type: ValidationErrorType;   // Error category
    message: string;             // Human-readable message
    range: Range;                // Character offsets (start/end)
    matchText: string;           // The problematic text
    id?: string;                 // Associated segment ID
    ruleId?: string;             // Specific rule identifier
};

Error ranges

Each error includes precise character offsets:
// From src/types.ts:31-33
export type Range = { 
    start: number;  // Inclusive
    end: number     // Exclusive
};
Use ranges to highlight errors in UI:
const result = validateTranslationResponse(segments, response);

for (const error of result.errors) {
    const snippet = response.slice(error.range.start, error.range.end);
    console.error(`${error.type} at [${error.range.start}:${error.range.end}]: ${snippet}`);
}

Validation configuration

Some rules accept configuration:
// From src/types.ts:81-83
export type ValidationConfig = {
    allCapsWordRunThreshold: number;  // Min consecutive caps words
};
Customize validation:
const result = validateTranslationResponse(
    segments,
    response,
    {
        config: {
            allCapsWordRunThreshold: 10  // Default: 5
        }
    }
);

Error descriptions

All error types have human-readable descriptions:
// From src/validation.ts:27-68
export const VALIDATION_ERROR_TYPE_INFO = {
    arabic_leak: {
        description: 'Arabic script was detected in output (except ﷺ).',
    },
    duplicate_id: {
        description: 'The same segment ID appears more than once in the response.',
    },
    invented_id: {
        description: 'The response contains a segment ID that does not exist in the provided source corpus.',
    },
    missing_id_gap: {
        description: 'A gap was detected: the response includes two IDs whose corpus order implies one or more intermediate IDs are missing.',
    },
    // ... etc
} as const satisfies Record<ValidationErrorType, { description: string }>;
Access descriptions programmatically:
import { VALIDATION_ERROR_TYPE_INFO } from 'wobble-bibble';

const desc = VALIDATION_ERROR_TYPE_INFO.arabic_leak.description;
console.log(desc);
// "Arabic script was detected in output (except ﷺ)."

Next steps

Prompts

Understand the prompt system

Stacking

Learn how rules are combined

Build docs developers (and LLMs) love