Documentation Index
Fetch the complete documentation index at: https://mintlify.com/adelpro/quran-search-engine/llms.txt
Use this file to discover all available pages before exploring further.
Arabic text normalization is essential for accurate search matching. The library provides two normalization functions that handle diacritics, Unicode variants, and character unification.
Overview
The normalization module exports two primary functions:
removeTashkeel() - Removes diacritics and Quranic marks
normalizeArabic() - Advanced normalization for search indexing
Remove diacritics
removeTashkeel(text: string): string
Removes Tashkeel (diacritics) and Quranic marks from Arabic text.
Use case: Stripping diacritics for display or simple comparisons.
import { removeTashkeel } from 'quran-search-engine';
const out = removeTashkeel('بِسْمِ ٱللَّهِ');
// out => 'بسم الله'
What it removes
From src/utils/normalization.ts:7-11:
export const removeTashkeel = (text: string): string => {
return text
.replace(/\u0671/g, '\u0627') // Wasl alef → regular alef
.replace(/[\u064B-\u065F\u0670\u06D6-\u06DC\u06DF-\u06E8\u06EA-\u06FC]/g, '');
};
The function removes:
- Wasl alef (converts to regular alef)
- All diacritical marks (Unicode range
\u064B-\u065F)
- Quranic annotation marks
- Superscript alef
Advanced normalization
normalizeArabic(text: string): string
Advanced Arabic normalization for search indexing. Handles Unicode normalization, variant unification, and cleanup.
Use case: Preparing user input for searching (unifies alef variants, removes tashkeel, etc).
import { normalizeArabic } from 'quran-search-engine';
const out = normalizeArabic('بِسْمِ ٱللَّهِ');
// out => 'بسم الله'
Normalization steps
Remove diacritics
Calls removeTashkeel() to strip all diacritical marks and applies Unicode NFC normalization.let normalizedText = removeTashkeel(text).normalize('NFC');
Remove special characters
Removes dagger alif and tatweel (elongation character).// dagger alif + tatweel
normalizedText = normalizedText.replace(/[\u0670\u0640]/g, '');
Unify alef variants
Normalizes all alef variants to a single form.// alef variants → ا
normalizedText = normalizedText.replace(/[إأآٱ]/g, 'ا');
Converts: إ أ آ ٱ → ا Unify hamza variants
Normalizes hamza on different carriers to standalone hamza.// hamza variants → ء
normalizedText = normalizedText.replace(/[ؤئء]/g, 'ء');
Converts: ؤ ئ → ء Unify ya variants
Converts alif maqsura to regular ya.// alif maqsura → ي
normalizedText = normalizedText.replace(/ى/g, 'ي');
Converts: ى → ي Clean whitespace and control characters
Removes line breaks, non-Arabic characters, and normalizes whitespace.// remove control chars / CRLF / non-Arabic symbols
normalizedText = normalizedText.replace(/[\r\n]+/g, ' ');
normalizedText = normalizedText.replace(/[^\u0621-\u064A\s-]+/g, '');
normalizedText = normalizedText.replace(/\s{2,}/g, ' ');
Trim and return
Removes leading and trailing whitespace.return normalizedText.trim();
Full implementation
From src/utils/normalization.ts:20-43:
export const normalizeArabic = (text: string): string => {
if (!text) return '';
let normalizedText = removeTashkeel(text).normalize('NFC');
// dagger alif + tatweel
normalizedText = normalizedText.replace(/[\u0670\u0640]/g, '');
// alef variants → ا
normalizedText = normalizedText.replace(/[إأآٱ]/g, 'ا');
// hamza variants → ء
normalizedText = normalizedText.replace(/[ؤئء]/g, 'ء');
// alif maqsura → ي
normalizedText = normalizedText.replace(/ى/g, 'ي');
// remove control chars / CRLF / non-Arabic symbols
normalizedText = normalizedText.replace(/[\r\n]+/g, ' ');
normalizedText = normalizedText.replace(/[^\u0621-\u064A\s-]+/g, '');
normalizedText = normalizedText.replace(/\s{2,}/g, ' ');
return normalizedText.trim();
};
Usage in search
The normalizeArabic() function is used internally throughout the search engine to ensure consistent matching:
import { normalizeArabic } from 'quran-search-engine';
export function containsAllTokens(value: string, query: string): boolean {
const normalizedQuery = normalizeArabic(query);
if (!normalizedQuery) return false;
const tokens = normalizedQuery.split(/\s+/);
const normalizedValue = normalizeArabic(value);
return tokens.every((token) => normalizedValue.includes(token));
}
Always normalize both the search query and the text being searched to ensure accurate matching regardless of input variations.
Character unification table
| Category | Original Characters | Normalized To |
|---|
| Alef variants | إ أ آ ٱ | ا |
| Hamza variants | ؤ ئ | ء |
| Ya variants | ى | ي |
| Removed | Diacritics, tatweel, dagger alif | (empty) |
Empty input strings return empty strings - the function gracefully handles null or undefined input.