Utility functions provide common text processing operations with special support for Arabic text.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ragaeeb/paragrafs/llms.txt
Use this file to discover all available pages before exploring further.
createHints
Creates normalized hints for robust Arabic matching (diacritics/punctuation tolerant). Hints are used bymarkTokensWithDividers to insert hard segment breaks at specific multi-word phrases.
Parameters
Either the first hint string, or an options object overriding the default normalization
Remaining hint strings (if the first argument was an options object)
Returns
A normalized hint map plus the normalization settings used for matching
Example
Default Normalization
By default, hints use the following Arabic normalization:normalizeAlef: true- Converts أإآ → اnormalizeYa: true- Converts ى → يremoveTatweel: true- Removes tatweel (ـ)normalizeHamza: false- Preserves hamza variations
formatSecondsToTimestamp
Formats seconds into a human-readable timestamp.Parameters
The time duration in seconds
Returns
Formatted timestamp string:
- For durations less than an hour:
m:ss(e.g., “1:05”) - For durations an hour or longer:
h:mm:ss(e.g., “1:02:05”)
Example
isEndingWithPunctuation
Checks if a text string ends with sentence-ending punctuation. Supports both English and Arabic punctuation marks.Parameters
The text to check for ending punctuation
Returns
true if the text ends with punctuation, false otherwiseSupported Punctuation
- Period:
. - Question mark:
?or؟(Arabic) - Exclamation:
! - Arabic semicolon:
؛ - Ellipsis:
…
Example
tokenizeGroundTruth
Tokenizes ground truth text properly, ensuring punctuation is attached to words rather than creating separate tokens.Parameters
The ground truth text to tokenize
Returns
The tokenized ground truth with punctuation properly attached to preceding words
Example
normalizeTokenText
Normalizes token text for Arabic-first matching and mining. This builds on basic normalization (diacritics + trim punctuation) and adds optional Arabic-specific normalizations. Use the same normalization for:- Mining repeated sequences
- Matching hints against tokens
Parameters
The token text to normalize
Optional Arabic-specific normalizations
Returns
A normalized token string suitable for comparisons
Normalization Process
- Decomposes Unicode characters (NFD normalization)
- Removes zero-width characters
- Removes Arabic diacritics
- Strips leading/trailing punctuation
- Applies optional Arabic-specific normalizations
- Recomposes Unicode characters (NFC normalization)
Example
Use Cases
- Hint matching: Normalize both hints and tokens for robust matching
- Phrase mining: Normalize text before counting n-gram frequencies
- Search: Normalize search queries and transcript text for better results
- Deduplication: Identify duplicate phrases despite different spellings