APOSTROPHE_LIKE_REGEX
A regular expression that matches apostrophe-like characters used in various languages and typographic contexts.Matches any of the following apostrophe-like characters:
'- Standard apostrophe (U+0027)'- Right single quotation mark (U+2019)'- Left single quotation mark (U+2018)`- Grave accent / backtick (U+0060)ʾ- Modifier letter right half ring (U+02BE)‛- Single high-reversed-9 quotation mark (U+201B)ʼ- Modifier letter apostrophe (U+02BC)ʻ- Modifier letter turned comma (U+02BB)ʿ- Modifier letter left half ring (U+02BF)
Usage
This constant is primarily used internally by the apostrophe normalization feature, but you can use it for your own text processing:When building a trie with
normalizeApostrophes: true, this regex is used to convert all apostrophe-like characters to the standard apostrophe ' for consistent matching.Use cases
- Text normalization: Standardize apostrophes before processing
- Custom validation: Check if text contains variant apostrophes
- Pattern detection: Identify non-standard apostrophe usage in user input
LETTER_REGEX
A Unicode-aware regular expression that matches any letter character.Matches any Unicode letter character using the Unicode property escape
\p{L}. This includes:- Latin letters (a-z, A-Z)
- Accented letters (é, ñ, ü, etc.)
- Non-Latin scripts (Arabic, Hebrew, Chinese, etc.)
- All other Unicode letter categories
Usage
This constant is used internally for case detection and word boundary analysis, but you can use it for custom text processing:Use cases
- Multilingual text processing: Detect letters in any language
- Custom tokenization: Split text while preserving Unicode letters
- Validation: Check if characters are alphabetic across all scripts
Related
Apostrophe Normalization
Learn how APOSTROPHE_LIKE_REGEX is used in normalization
Utility Functions
Functions that use these constants internally