Overview
Apostrophe normalization is a global option that treats all apostrophe-like characters as equivalent during matching. This is particularly useful for text that may contain different apostrophe encodings (curly quotes, backticks, modifier letters, Arabic diacritics, etc.). When enabled, a rule defined with a standard apostrophe (') will match all apostrophe-like variants in the input text.
Supported characters
The following characters are treated as equivalent whennormalizeApostrophes: true:
| Character | Unicode | Name | Example |
|---|---|---|---|
' | U+0027 | Standard apostrophe | don't |
' | U+2019 | Right single quotation mark (curly) | don't |
` | U+0060 | Grave accent (backtick) | don`t |
ʼ | U+02BC | Modifier letter apostrophe | donʼt |
ʾ | U+02BE | Modifier letter right half ring (hamza) | donʾt |
‛ | U+201B | Single high-reversed-9 quotation mark | don‛t |
ʻ | U+02BB | Modifier letter turned comma | donʻt |
ʿ | U+02BF | Modifier letter left half ring (ain) | donʿt |
The normalization regex is defined in
src/constants.ts as:Enabling normalization
Apostrophe normalization is controlled by thenormalizeApostrophes option in BuildTrieOptions:
Basic example
Without
normalizeApostrophes: true, each variant would require a separate entry in the from array.How it works
During build time
When building the trie withnormalizeApostrophes: true:
- Each source word in
fromarrays has apostrophe-like characters replaced with the standard apostrophe (') - The normalized form is inserted into the trie
- The
buildOptionsobject is stored at the trie root for reference during search
During search time
When searching withsearchAndReplace:
- The algorithm checks if
trie.buildOptions?.normalizeApostrophesistrue - For each character in the input text:
- If it matches
APOSTROPHE_LIKE_REGEX, it’s converted to'for trie lookup - The original character in the text is preserved (not modified)
- If it matches
- Matches are found using the normalized lookup character
Real-world use cases
Arabic transliteration
Apostrophes often represent Arabic diacritics like hamza (ʾ) and ain (ʿ). Different sources may use different apostrophe encodings:English contractions
Different text editors and keyboards produce different apostrophe characters:Performance considerations
Apostrophe normalization has minimal performance impact:Build time
Build time
- Each source word is scanned once for apostrophe-like characters
- Replacement is a simple string operation
- Impact: Negligible for typical rule sets
Search time
Search time
- Each input character is tested against
APOSTROPHE_LIKE_REGEXif normalization is enabled - Unicode regex test is efficient (single character match)
- Only affects characters that match the pattern
- Impact: Minimal overhead, typically <5% in real-world text
Memory
Memory
- No additional memory overhead
- Trie stores only normalized forms
- Impact: None
Benchmark results show that
searchAndReplace with apostrophe normalization completes in approximately 71 microseconds for typical inputs (see README performance section).Combining with other features
With case insensitivity
With clipping patterns
Without normalization
If you need to distinguish between different apostrophe types, setnormalizeApostrophes: false (or omit it, as it defaults to false):
Next steps
Rules
Learn more about rule structure and options
Matching options
Explore MatchType and clipping patterns