What is a trie?
A trie (pronounced “try”) is a tree-like data structure that stores strings in a way that allows for fast prefix-based searching. In trie-rules, the trie stores all source words from your rules, enabling efficient pattern matching during search and replace operations.Why use a trie?
Compared to traditional regex-based approaches, a trie offers several advantages:Performance for large rule sets
Performance for large rule sets
When matching against thousands of patterns, a regex with many alternatives (
pattern1|pattern2|...) becomes inefficient. The regex engine must check each alternative sequentially.A trie structures patterns in a tree, allowing simultaneous comparison of multiple patterns by sharing common prefixes. This leads to significantly faster searches.Predictable time complexity
Predictable time complexity
Regex matching can degrade to exponential time complexity in worst-case scenarios with complex patterns and backtracking.Trie-based search runs in O(n) time, where n is the length of the input text, making performance predictable and consistent.
Custom matching logic
Custom matching logic
Implementing custom rules like
MatchType.Whole or MatchType.Alone with regex requires complex lookahead/lookbehind assertions.With a trie, custom matching logic is straightforward to implement and extend.Longest match prioritization
Longest match prioritization
Regex matches based on the order patterns appear in the alternation. Prioritizing longer matches over shorter ones requires careful pattern ordering.A trie naturally handles this by continuing to traverse deeper until no more matches are possible.
TrieNode structure
Each node in the trie is represented by aTrieNode object:
Key properties
Dynamic character properties
The index signature[key: string] allows each node to have properties representing characters in the source words. Each character points to either:
- Another
TrieNodefor nested characters undefinedif the character path doesn’t exist
isEndOfWord
A boolean flag indicating whether the current node marks the end of a complete word from the rules. Whentrue, a match has been found.
target
The replacement string for the rule at this node. This is theto value from the original Rule object.
options
TheRuleOptions associated with this rule, containing configuration like match, casing, clipStartPattern, etc.
buildOptions
Global options stored at the trie root. Only present on the root node. Used to pass configuration likenormalizeApostrophes to search operations.
Building the trie
ThebuildTrie function constructs the trie from an array of rules:
Build process
For each rule:- Iterate through source words in the
fromarray - Apply transformations:
- Normalize apostrophes if
normalizeApostrophesis enabled - Generate case variants if
casing: CaseSensitivity.Insensitive
- Normalize apostrophes if
- Insert into trie by traversing character by character:
- Create new nodes for characters that don’t exist
- Reuse existing nodes for shared prefixes
- Mark the end by setting
isEndOfWord: trueand storing thetargetandoptions
The build process automatically handles case-insensitive rules by generating lowercase, uppercase, and capitalized variants, so lookups remain fast without runtime case conversion.
Example: Building a simple trie
Searching the trie
DuringsearchAndReplace, the algorithm:
- Starts at position i in the input text
- Traverses the trie character by character:
- If
normalizeApostrophesis enabled, apostrophe-like characters are normalized to'during lookup - If a character doesn’t exist in the current node, the traversal stops
- If
- Tracks valid matches when encountering nodes with
isEndOfWord: true - Validates the match using
isValidMatch()to check boundary conditions based onMatchType - Applies the longest valid match and continues from the end of the replacement
Search complexity
Let:l= length of input textk= average length of source words
- The trie allows quick elimination of non-matching paths
- Not every character leads to a potential match
- Shared prefixes reduce redundant comparisons
Example: Apostrophe normalization
WhennormalizeApostrophes: true is set:
"don't"is normalized to"don't"(standard apostrophe)
- All apostrophe-like characters (
',',`,ʼ,ʾ,ʿ) in the input are treated as' - Input
"don't","dont”, and”don’t”` all match the same trie path
Normalization happens at lookup time for the input text, so the trie structure only stores the normalized form.
Runtime complexity summary
buildTrie
Let:n= number of rulesm= average number of sources per rulek= average length of source words
searchAndReplace
Let:l= length of input textk= average length of source wordsp= potential matches per position (typically small and constant)
Memory efficiency
While a trie may use more memory than a compact regex pattern, memory usage is:- Predictable: Scales linearly with the total number of unique character paths
- Efficient for shared prefixes: Common prefixes are stored only once
- Stable: No unpredictable memory spikes during matching (unlike regex with capturing groups)
Next steps
Matching options
Learn how MatchType affects when rules are applied
Apostrophe normalization
Deep dive into flexible apostrophe matching