Overview
The validation system checks translations for:- ID integrity: Invented IDs, duplicates, missing segments
- Format compliance: Marker syntax, line breaks, structure
- Script leaks: Arabic characters in output
- Translation quality: Truncation, length mismatches, empty content
Validation error types
Wobble-bibble defines 13 error types. Each has a stable ID and human-readable description:ID integrity errors
These catch LLM hallucinations where segment IDs are invented, duplicated, or skipped.- invented_id
- duplicate_id
- missing_id_gap
Description: The response contains a segment ID that doesn’t exist in the source corpus.Example failure:
Format errors
These catch malformed markers and structural issues.- invalid_marker_format
- newline_after_id
- collapsed_speakers
Description: A segment marker line is malformed (wrong ID shape or missing content after dash).Validates against the marker pattern:Example failures:
Content errors
These catch translation quality issues and script leaks.- arabic_leak
- truncated_segment
- length_mismatch
- empty_parentheses
Description: Arabic script detected in output (except ﷺ).Uses Unicode ranges for Arabic detection:Covers:
\u0600-\u06FF- Arabic block\u0750-\u077F- Arabic Supplement\uFB50-\uFDFF- Arabic Presentation Forms\uFE70-\uFEFF- Arabic Presentation Forms-B
Validation result
The validator returns a structured result:Error ranges
Each error includes precise character offsets:Validation configuration
Some rules accept configuration:Error descriptions
All error types have human-readable descriptions:Next steps
Prompts
Understand the prompt system
Stacking
Learn how rules are combined