Skip to main content

Overview

Apostrophe normalization is a global option that treats all apostrophe-like characters as equivalent during matching. This is particularly useful for text that may contain different apostrophe encodings (curly quotes, backticks, modifier letters, Arabic diacritics, etc.). When enabled, a rule defined with a standard apostrophe (') will match all apostrophe-like variants in the input text.

Supported characters

The following characters are treated as equivalent when normalizeApostrophes: true:
CharacterUnicodeNameExample
'U+0027Standard apostrophedon't
'U+2019Right single quotation mark (curly)don't
`U+0060Grave accent (backtick)don`t
ʼU+02BCModifier letter apostrophedonʼt
ʾU+02BEModifier letter right half ring (hamza)donʾt
U+201BSingle high-reversed-9 quotation markdon‛t
ʻU+02BBModifier letter turned commadonʻt
ʿU+02BFModifier letter left half ring (ain)donʿt
The normalization regex is defined in src/constants.ts as:
export const APOSTROPHE_LIKE_REGEX = /['''`ʾ‛ʼʻʿ]/u;

Enabling normalization

Apostrophe normalization is controlled by the normalizeApostrophes option in BuildTrieOptions:
type BuildTrieOptions = {
  normalizeApostrophes?: boolean;
};

Basic example

import { buildTrie, searchAndReplace } from 'trie-rules';

const rules = [
  {
    from: ["don't"],
    to: "do not"
  }
];

// Build trie with normalization enabled
const trie = buildTrie(rules, { normalizeApostrophes: true });

// All of these will match:
console.log(searchAndReplace(trie, "don't worry"));
// Output: "do not worry"

console.log(searchAndReplace(trie, "don't worry"));  // curly quote
// Output: "do not worry"

console.log(searchAndReplace(trie, "don`t worry"));  // backtick
// Output: "do not worry"

console.log(searchAndReplace(trie, "donʾt worry"));  // hamza
// Output: "do not worry"
Without normalizeApostrophes: true, each variant would require a separate entry in the from array.

How it works

During build time

When building the trie with normalizeApostrophes: true:
  1. Each source word in from arrays has apostrophe-like characters replaced with the standard apostrophe (')
  2. The normalized form is inserted into the trie
  3. The buildOptions object is stored at the trie root for reference during search
// src/trie.ts - buildTrie function
for (let source of sources) {
  source = normalizeApostrophes 
    ? source.replace(APOSTROPHE_LIKE_REGEX, "'") 
    : source;
  
  // ... insert into trie
}

During search time

When searching with searchAndReplace:
  1. The algorithm checks if trie.buildOptions?.normalizeApostrophes is true
  2. For each character in the input text:
    • If it matches APOSTROPHE_LIKE_REGEX, it’s converted to ' for trie lookup
    • The original character in the text is preserved (not modified)
  3. Matches are found using the normalized lookup character
// src/trie.ts - searchAndReplace function
let lookupChar = currentChar;

// If apostrophe normalization is enabled, normalize apostrophe-like chars
if (normalizeApostrophes && APOSTROPHE_LIKE_REGEX.test(currentChar)) {
  lookupChar = "'";
}

if (!node[lookupChar]) {
  break;
}
Normalization only affects matching logic, not the output text. Apostrophe-like characters in the input are removed/replaced according to the rule’s replacement value.

Real-world use cases

Arabic transliteration

Apostrophes often represent Arabic diacritics like hamza (ʾ) and ain (ʿ). Different sources may use different apostrophe encodings:
const rules = [
  {
    from: ["al-Qur'an"],
    to: 'al-Qurʾān',
    options: { match: MatchType.Whole }
  },
  {
    from: ["Ka'bah"],
    to: 'Kaʿbah'
  }
];

const trie = buildTrie(rules, { normalizeApostrophes: true });

// All of these match correctly:
console.log(searchAndReplace(trie, "The recitation of al-Qur'an"));
// Output: "The recitation of al-Qurʾān"

console.log(searchAndReplace(trie, 'We went by the Ka`bah yesterday.'));
// Output: "We went by the Kaʿbah yesterday."

console.log(searchAndReplace(trie, 'The holy al-Qurʾan and sacred Kaʿbah'));
// Output: "The holy al-Qurʾān and sacred Kaʿbah"

English contractions

Different text editors and keyboards produce different apostrophe characters:
const rules = [
  { from: ["don't"], to: "do not" },
  { from: ["can't"], to: "cannot" },
  { from: ["won't"], to: "will not" }
];

const trie = buildTrie(rules, { normalizeApostrophes: true });

const text = "I don't think we can't do this, but I won't give up.";
console.log(searchAndReplace(trie, text));
// Output: "I do not think we cannot do this, but I will not give up."

Performance considerations

Apostrophe normalization has minimal performance impact:
  • Each source word is scanned once for apostrophe-like characters
  • Replacement is a simple string operation
  • Impact: Negligible for typical rule sets
  • Each input character is tested against APOSTROPHE_LIKE_REGEX if normalization is enabled
  • Unicode regex test is efficient (single character match)
  • Only affects characters that match the pattern
  • Impact: Minimal overhead, typically <5% in real-world text
  • No additional memory overhead
  • Trie stores only normalized forms
  • Impact: None
Benchmark results show that searchAndReplace with apostrophe normalization completes in approximately 71 microseconds for typical inputs (see README performance section).

Combining with other features

With case insensitivity

const rules = [
  {
    from: ["don't"],
    to: "do not",
    options: {
      casing: CaseSensitivity.Insensitive
    }
  }
];

const trie = buildTrie(rules, { normalizeApostrophes: true });

// Matches all case and apostrophe variants:
console.log(searchAndReplace(trie, "Don't worry"));
// Output: "Do not worry"

console.log(searchAndReplace(trie, "DON'T WORRY"));
// Output: "DO NOT WORRY"

console.log(searchAndReplace(trie, "don`t worry"));
// Output: "do not worry"

With clipping patterns

const rules = [
  {
    from: ["test"],
    to: "result",
    options: {
      clipStartPattern: TriePattern.Apostrophes,
      clipEndPattern: TriePattern.Apostrophes
    }
  }
];

const trie = buildTrie(rules, { normalizeApostrophes: true });

console.log(searchAndReplace(trie, "The 'test' here"));
// Output: "The result here"

console.log(searchAndReplace(trie, "The ʿtest` here"));
// Output: "The result here" (different apostrophes clipped)
When using both apostrophe normalization and clipping patterns, be aware that clipping happens after matching. The TriePattern.Apostrophes pattern will clip any apostrophe-like character, regardless of normalization settings.

Without normalization

If you need to distinguish between different apostrophe types, set normalizeApostrophes: false (or omit it, as it defaults to false):
const rules = [
  { from: ["don't"], to: "do not" },     // standard apostrophe
  { from: ["don't"], to: "do not" },     // curly quote
  { from: ["don`t"], to: "do not" }      // backtick
];

const trie = buildTrie(rules); // normalizeApostrophes defaults to false

// Each variant must be explicitly defined in the rules
If you’re working with text from multiple sources with inconsistent apostrophe usage, enabling normalization will significantly reduce the number of rules you need to maintain.

Next steps

Rules

Learn more about rule structure and options

Matching options

Explore MatchType and clipping patterns

Build docs developers (and LLMs) love