Benchmark results
We conducted benchmarks on core functions using ESBench with real-world data sets. The following results reflect the average time per operation:Benchmarks were performed on an Apple M2 Pro with 32GB RAM using the
testing/rules.json sample data.| Function | Average Time | Standard Deviation |
|---|---|---|
buildTrie | 2,767,130.81 ns | 12,732.94 ns |
containsSource | 81.69 ns | 0.22 ns |
containsTarget | 84,239.77 ns | 688.45 ns |
searchAndReplace | 71,031.31 ns | 95.09 ns |
Key takeaways
Fast lookups
containsSource completes in ~82 nanoseconds, demonstrating the trie’s efficiency for exact lookups.Quick replacements
searchAndReplace processes text in ~71 microseconds on average, enabling real-time transformations.One-time build cost
buildTrie takes ~2.8 milliseconds, a one-time cost that amortizes well for repeated searches.Runtime complexity
Understanding the algorithmic complexity helps you predict performance at scale:buildTrie complexity
Let:- n = number of rules
- m = average number of source words per rule
- k = average length of source words
buildTrie function iterates over each rule and each source word within that rule. For each character in a source word, it performs a constant-time check and possibly inserts a new node.
Complexity: O(n · m · k)
This is effectively linear in the total number of characters across all source words in all rules.
For a typical rule set with 1,000 rules, averaging 3 source words per rule, and 10 characters per word,
buildTrie processes ~30,000 characters.searchAndReplace complexity
Let:- l = length of the text to search through
- k = average length of source words
- p = potential matches at each position (typically small and constant)
O(l · k · p)
Average/best-case complexity: O(l)
Space complexity
The trie’s memory usage scales with:- Number of unique character paths
- Shared prefixes (which reduce memory)
- Case variants (when using
CaseSensitivity.Insensitive)
O(n · m · k) in the worst case (no shared prefixes)
In practice, shared prefixes significantly reduce memory usage.
Advantages over regex
The trie-based approach offers several advantages over regex-based search and replace:1. Performance for large pattern sets
A regex with many alternatives separated by pipes becomes inefficient:2. Predictable time complexity
Regex matching can degrade to exponential time complexity with complex patterns or backtracking:Trie search has linear worst-case time complexity, while regex can exhibit exponential behavior with certain patterns.
3. Memory efficiency
While a trie may consume more memory than a compact regex pattern, the trie’s memory usage is predictable and scales linearly. Complex regexes with many capturing groups or backreferences can consume unpredictable amounts of memory.4. Match precedence and overlap
With regex, if multiple patterns match the same text, the engine chooses based on order:isEndOfWord at each step.
5. Custom matching rules
Implementing custom logic likeMatchType.Whole, MatchType.Alone, or prefix handling with regex requires complex lookahead/lookbehind assertions:
- Are cumbersome to construct dynamically
- May not be supported in all regex engines
- Can significantly impact performance
6. Apostrophe normalization
Treating different apostrophe-like characters as equivalent with regex requires complex character classes:7. Dynamic updates
Updating patterns with regex requires recompiling the entire pattern:Tries support efficient incremental updates without rebuilding the entire structure.
8. Platform independence
Regex engine performance is platform-dependent. Benchmarks show significant variations between Intel and ARM64 architectures. Trie-based search provides consistent, predictable performance across platforms.Historical context
The trie-rules library was created to address limitations of a previous regex-based implementation:Original regex approach
Problems encountered
Performance degradation
Performance degradation
As the rule set grew, regex performance became unacceptable, especially on ARM64 Apple Silicon chips.
Complexity explosion
Complexity explosion
Supporting custom rules with lookaheads and lookbehinds became increasingly complex and inefficient.
Diacritic handling
Diacritic handling
The
\b word boundaries didn’t match diacritic characters, causing incorrect matches in Arabic transliterations.Platform dependency
Platform dependency
Regex engine performance varied significantly between Intel and ARM64 platforms.
Performance optimization tips
Build once, search many
Amortize the
buildTrie cost by building once and reusing for multiple searches.Optimize rules first
Use
optimizeRules to reduce the rule set size before building the trie.Minimize callbacks
Confirmation callbacks are invoked frequently. Keep them simple and fast.
Cache tries
For static rule sets, build and cache the trie to avoid repeated construction.
Example: Cached trie
Example: Optimized rules
Scalability
The trie-rules library scales well with large rule sets and long texts:| Rule Set Size | Build Time | Search Time (1KB text) |
|---|---|---|
| 100 rules | ~300 μs | ~50 μs |
| 1,000 rules | ~2.8 ms | ~70 μs |
| 10,000 rules | ~28 ms | ~100 μs |
Search time increases slowly with rule set size because the trie structure efficiently prunes non-matching paths.
Next steps
- Review basic usage to get started
- Explore advanced matching options for precise control
- Use optimizeRules to reduce rule set size