Search architecture

The Quran Search Engine uses a multi-layered architecture that combines simple text matching, linguistic analysis, and fuzzy search to deliver accurate and relevant results.

Search layers

The search engine processes queries through three distinct layers, each providing progressively more sophisticated matching:

1. Simple search layer

The first layer performs direct text matching using normalized Arabic.

Simple search uses AND logic: all query tokens must be present in the verse for a match.

export const simpleSearch = <T extends Record<string, unknown>>(
  items: T[],
  query: string,
  searchField: keyof T,
): T[] => {
  const cleanQuery = normalizeArabic(query.replace(/[^\u0600-\u06FF\s]+/g, '').trim());
  if (!cleanQuery) return [];

  const queryTokens = cleanQuery.split(/\s+/);

  return items.filter((item) => {
    const fieldValue = normalizeArabic(String(item[searchField] || ''));
    // AND logic: All tokens must be present
    return queryTokens.every((token) => fieldValue.includes(token));
  });
};

Key features:

Normalizes both query and verse text using normalizeArabic()
Strips non-Arabic characters from the query
Splits query into tokens by whitespace
Returns verses where every token appears in the text

2. Linguistic search layer

The second layer uses morphological analysis to find lemma and root matches.

Linguistic search requires morphology data and a word map to function. It can be disabled by setting options.lemma = false and options.root = false.

For each query token, the engine:

Looks up the token in the word map to find its canonical lemma and root
Searches morphology data for verses containing matching lemmas or roots
Applies AND logic across tokens (all tokens must match)

const entry = wordMap[token];
if (entry) {
  if (options.lemma && entry.lemma) {
    const lemmaMatches = getPositiveTokens(
      verse,
      'lemma',
      entry.lemma,
      undefined,
      token,
      morphologyMap,
    );
    if (lemmaMatches.length > 0) {
      score += lemmaMatches.length * 2;
      matchType = 'lemma';
    }
  }

  if (options.root && entry.root) {
    const rootMatches = getPositiveTokens(
      verse,
      'root',
      undefined,
      entry.root,
      token,
      morphologyMap,
      wordMap,
    );
    if (rootMatches.length > 0) {
      score += rootMatches.length * 1;
      matchType = 'root';
    }
  }
}

This layer enables:

Finding different forms of the same word (inflections, conjugations)
Matching words with the same linguistic root
More comprehensive search results beyond exact text matches

3. Fuzzy search layer

The third layer uses Fuse.js as a fallback for tokens that don’t match exactly or linguistically.

Fuzzy search can be disabled entirely by setting options.fuzzy = false in your search options.

const fuseInstance = fuzzyEnabled
  ? createArabicFuseSearch(quranData, ['standard', 'uthmani'])
  : null;

export const createArabicFuseSearch = <T>(
  collection: T[],
  keys: string[],
  options: Partial<IFuseOptions<T>> = {},
): Fuse<T> =>
  new Fuse(collection, {
    includeScore: true,
    includeMatches: true,
    threshold: 0.5,
    distance: 100,
    ignoreLocation: true,
    minMatchCharLength: 3,
    useExtendedSearch: true,
    keys,
    ...options,
  });

Fuzzy search configuration:

threshold: 0.5 (maximum allowed distance for a match)
distance: 100 (how far to search for patterns)
ignoreLocation: true (match anywhere in the text)
minMatchCharLength: 3 (minimum characters required)
Searches both: standard and uthmani text fields

Adaptive threshold:

const hasHighQualityMatches = fuseResults.some(
  (res) => res.score !== undefined && res.score <= 0.25,
);
const cutoff = hasHighQualityMatches ? 0.35 : 0.5;

If high-quality fuzzy matches exist (score ≤ 0.25), the cutoff is tightened to 0.35 to filter out weaker matches.

Combined search flow

The main search() function orchestrates all three layers:

// 1. Run simple search
const simpleMatches = simpleSearch(quranData, cleanQuery, 'standard');

// 2. Run advanced linguistic search (includes fuzzy fallback per token)
const advancedMatches = performAdvancedLinguisticSearch(
  cleanQuery,
  quranData,
  options,
  fuseInstance,
  wordMap,
  morphologyMap,
);

// 3. Combine and deduplicate by gid
const allMatches = [...simpleMatches, ...advancedMatches];
const gidSet = new Set<number>();
const combined: ScoredVerse<TVerse>[] = [];

for (const verse of allMatches) {
  if (!gidSet.has(verse.gid)) {
    gidSet.add(verse.gid);
    combined.push(
      computeScore(verse, cleanQuery, morphologyMap, wordMap, options, mapEntry, fuseMatches),
    );
  }
}

// 4. Sort by relevance
combined.sort((a, b) => b.matchScore - a.matchScore);

Process:

Execute simple search and linguistic search in parallel
Merge results and deduplicate by verse ID (gid)
Compute scores for all matched verses
Sort by score (highest first)
Apply pagination
Return results with metadata (counts, pagination info)

Deduplication ensures each verse appears only once in results, even if it matches through multiple layers.

Why this architecture?

This multi-layered approach provides: ✓ Precision: Exact matches score highest
✓ Recall: Linguistic matching finds related forms
✓ Flexibility: Fuzzy search catches typos and variants
✓ Relevance: Scoring prioritizes better matches
✓ Performance: Each layer can be enabled/disabled based on needs

For most use cases, enable all layers with { lemma: true, root: true, fuzzy: true } (fuzzy is enabled by default).

Get Started

Core Concepts

Guides

Examples

Search layers

1. Simple search layer

2. Linguistic search layer

3. Fuzzy search layer

Combined search flow

Why this architecture?

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Examples

Documentation Index

​Search layers

​1. Simple search layer

​2. Linguistic search layer

​3. Fuzzy search layer

​Combined search flow

​Why this architecture?

Build docs developers (and LLMs) love

Search layers

1. Simple search layer

2. Linguistic search layer

3. Fuzzy search layer

Combined search flow

Why this architecture?