Skip to main content
The trie-rules library exports two regular expression constants that are used internally and can be useful for custom text processing.

APOSTROPHE_LIKE_REGEX

A regular expression that matches apostrophe-like characters used in various languages and typographic contexts.
const APOSTROPHE_LIKE_REGEX = /['''`ʾ‛ʼʻʿ]/u;
APOSTROPHE_LIKE_REGEX
RegExp
Matches any of the following apostrophe-like characters:
  • ' - Standard apostrophe (U+0027)
  • ' - Right single quotation mark (U+2019)
  • ' - Left single quotation mark (U+2018)
  • ` - Grave accent / backtick (U+0060)
  • ʾ - Modifier letter right half ring (U+02BE)
  • - Single high-reversed-9 quotation mark (U+201B)
  • ʼ - Modifier letter apostrophe (U+02BC)
  • ʻ - Modifier letter turned comma (U+02BB)
  • ʿ - Modifier letter left half ring (U+02BF)

Usage

This constant is primarily used internally by the apostrophe normalization feature, but you can use it for your own text processing:
import { APOSTROPHE_LIKE_REGEX } from 'trie-rules';

const text = "Don't use fancy apostrophes like don't or don`t";
const normalized = text.replace(new RegExp(APOSTROPHE_LIKE_REGEX, 'g'), "'");

console.log(normalized);
// Output: "Don't use fancy apostrophes like don't or don't"
When building a trie with normalizeApostrophes: true, this regex is used to convert all apostrophe-like characters to the standard apostrophe ' for consistent matching.

Use cases

  • Text normalization: Standardize apostrophes before processing
  • Custom validation: Check if text contains variant apostrophes
  • Pattern detection: Identify non-standard apostrophe usage in user input

LETTER_REGEX

A Unicode-aware regular expression that matches any letter character.
const LETTER_REGEX = /\p{L}/u;
LETTER_REGEX
RegExp
Matches any Unicode letter character using the Unicode property escape \p{L}. This includes:
  • Latin letters (a-z, A-Z)
  • Accented letters (é, ñ, ü, etc.)
  • Non-Latin scripts (Arabic, Hebrew, Chinese, etc.)
  • All other Unicode letter categories

Usage

This constant is used internally for case detection and word boundary analysis, but you can use it for custom text processing:
import { LETTER_REGEX } from 'trie-rules';

const text = "Hello 世界! مرحبا";
const letters = text.match(new RegExp(LETTER_REGEX, 'gu'));

console.log(letters);
// Output: ['H', 'e', 'l', 'l', 'o', '世', '界', 'م', 'ر', 'ح', 'ب', 'ا']

Use cases

  • Multilingual text processing: Detect letters in any language
  • Custom tokenization: Split text while preserving Unicode letters
  • Validation: Check if characters are alphabetic across all scripts
The u flag is required when using Unicode property escapes like \p{L}. Make sure to include it when creating your own RegExp instances with these patterns.

Apostrophe Normalization

Learn how APOSTROPHE_LIKE_REGEX is used in normalization

Utility Functions

Functions that use these constants internally

Build docs developers (and LLMs) love