Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/adelpro/quran-search-engine/llms.txt

Use this file to discover all available pages before exploring further.

Arabic text normalization is essential for accurate search matching. The library provides two normalization functions that handle diacritics, Unicode variants, and character unification.

Overview

The normalization module exports two primary functions:
  • removeTashkeel() - Removes diacritics and Quranic marks
  • normalizeArabic() - Advanced normalization for search indexing

Remove diacritics

removeTashkeel(text: string): string

Removes Tashkeel (diacritics) and Quranic marks from Arabic text. Use case: Stripping diacritics for display or simple comparisons.
import { removeTashkeel } from 'quran-search-engine';

const out = removeTashkeel('بِسْمِ ٱللَّهِ');
// out => 'بسم الله'

What it removes

From src/utils/normalization.ts:7-11:
export const removeTashkeel = (text: string): string => {
  return text
    .replace(/\u0671/g, '\u0627') // Wasl alef → regular alef
    .replace(/[\u064B-\u065F\u0670\u06D6-\u06DC\u06DF-\u06E8\u06EA-\u06FC]/g, '');
};
The function removes:
  • Wasl alef (converts to regular alef)
  • All diacritical marks (Unicode range \u064B-\u065F)
  • Quranic annotation marks
  • Superscript alef

Advanced normalization

normalizeArabic(text: string): string

Advanced Arabic normalization for search indexing. Handles Unicode normalization, variant unification, and cleanup. Use case: Preparing user input for searching (unifies alef variants, removes tashkeel, etc).
import { normalizeArabic } from 'quran-search-engine';

const out = normalizeArabic('بِسْمِ ٱللَّهِ');
// out => 'بسم الله'

Normalization steps

1

Remove diacritics

Calls removeTashkeel() to strip all diacritical marks and applies Unicode NFC normalization.
let normalizedText = removeTashkeel(text).normalize('NFC');
2

Remove special characters

Removes dagger alif and tatweel (elongation character).
// dagger alif + tatweel
normalizedText = normalizedText.replace(/[\u0670\u0640]/g, '');
3

Unify alef variants

Normalizes all alef variants to a single form.
// alef variants → ا
normalizedText = normalizedText.replace(/[إأآٱ]/g, 'ا');
Converts: إ أ آ ٱ → ا
4

Unify hamza variants

Normalizes hamza on different carriers to standalone hamza.
// hamza variants → ء
normalizedText = normalizedText.replace(/[ؤئء]/g, 'ء');
Converts: ؤ ئ → ء
5

Unify ya variants

Converts alif maqsura to regular ya.
// alif maqsura → ي
normalizedText = normalizedText.replace(/ى/g, 'ي');
Converts: ى → ي
6

Clean whitespace and control characters

Removes line breaks, non-Arabic characters, and normalizes whitespace.
// remove control chars / CRLF / non-Arabic symbols
normalizedText = normalizedText.replace(/[\r\n]+/g, ' ');
normalizedText = normalizedText.replace(/[^\u0621-\u064A\s-]+/g, '');
normalizedText = normalizedText.replace(/\s{2,}/g, ' ');
7

Trim and return

Removes leading and trailing whitespace.
return normalizedText.trim();

Full implementation

From src/utils/normalization.ts:20-43:
export const normalizeArabic = (text: string): string => {
  if (!text) return '';

  let normalizedText = removeTashkeel(text).normalize('NFC');

  // dagger alif + tatweel
  normalizedText = normalizedText.replace(/[\u0670\u0640]/g, '');

  // alef variants → ا
  normalizedText = normalizedText.replace(/[إأآٱ]/g, 'ا');

  // hamza variants → ء
  normalizedText = normalizedText.replace(/[ؤئء]/g, 'ء');

  // alif maqsura → ي
  normalizedText = normalizedText.replace(/ى/g, 'ي');

  // remove control chars / CRLF / non-Arabic symbols
  normalizedText = normalizedText.replace(/[\r\n]+/g, ' ');
  normalizedText = normalizedText.replace(/[^\u0621-\u064A\s-]+/g, '');
  normalizedText = normalizedText.replace(/\s{2,}/g, ' ');

  return normalizedText.trim();
};
The normalizeArabic() function is used internally throughout the search engine to ensure consistent matching:
import { normalizeArabic } from 'quran-search-engine';

export function containsAllTokens(value: string, query: string): boolean {
  const normalizedQuery = normalizeArabic(query);
  if (!normalizedQuery) return false;

  const tokens = normalizedQuery.split(/\s+/);
  const normalizedValue = normalizeArabic(value);
  return tokens.every((token) => normalizedValue.includes(token));
}
Always normalize both the search query and the text being searched to ensure accurate matching regardless of input variations.

Character unification table

CategoryOriginal CharactersNormalized To
Alef variantsإ أ آ ٱا
Hamza variantsؤ ئء
Ya variantsىي
RemovedDiacritics, tatweel, dagger alif(empty)
Empty input strings return empty strings - the function gracefully handles null or undefined input.

Build docs developers (and LLMs) love