Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/adelpro/quran-search-engine/llms.txt

Use this file to discover all available pages before exploring further.

The Quran Search Engine uses a multi-layered architecture that combines simple text matching, linguistic analysis, and fuzzy search to deliver accurate and relevant results.

Search layers

The search engine processes queries through three distinct layers, each providing progressively more sophisticated matching:

1. Simple search layer

The first layer performs direct text matching using normalized Arabic.
Simple search uses AND logic: all query tokens must be present in the verse for a match.
export const simpleSearch = <T extends Record<string, unknown>>(
  items: T[],
  query: string,
  searchField: keyof T,
): T[] => {
  const cleanQuery = normalizeArabic(query.replace(/[^\u0600-\u06FF\s]+/g, '').trim());
  if (!cleanQuery) return [];

  const queryTokens = cleanQuery.split(/\s+/);

  return items.filter((item) => {
    const fieldValue = normalizeArabic(String(item[searchField] || ''));
    // AND logic: All tokens must be present
    return queryTokens.every((token) => fieldValue.includes(token));
  });
};
Key features:
  • Normalizes both query and verse text using normalizeArabic()
  • Strips non-Arabic characters from the query
  • Splits query into tokens by whitespace
  • Returns verses where every token appears in the text

2. Linguistic search layer

The second layer uses morphological analysis to find lemma and root matches.
Linguistic search requires morphology data and a word map to function. It can be disabled by setting options.lemma = false and options.root = false.
For each query token, the engine:
  1. Looks up the token in the word map to find its canonical lemma and root
  2. Searches morphology data for verses containing matching lemmas or roots
  3. Applies AND logic across tokens (all tokens must match)
const entry = wordMap[token];
if (entry) {
  if (options.lemma && entry.lemma) {
    const lemmaMatches = getPositiveTokens(
      verse,
      'lemma',
      entry.lemma,
      undefined,
      token,
      morphologyMap,
    );
    if (lemmaMatches.length > 0) {
      score += lemmaMatches.length * 2;
      matchType = 'lemma';
    }
  }

  if (options.root && entry.root) {
    const rootMatches = getPositiveTokens(
      verse,
      'root',
      undefined,
      entry.root,
      token,
      morphologyMap,
      wordMap,
    );
    if (rootMatches.length > 0) {
      score += rootMatches.length * 1;
      matchType = 'root';
    }
  }
}
This layer enables:
  • Finding different forms of the same word (inflections, conjugations)
  • Matching words with the same linguistic root
  • More comprehensive search results beyond exact text matches

3. Fuzzy search layer

The third layer uses Fuse.js as a fallback for tokens that don’t match exactly or linguistically.
Fuzzy search can be disabled entirely by setting options.fuzzy = false in your search options.
const fuseInstance = fuzzyEnabled
  ? createArabicFuseSearch(quranData, ['standard', 'uthmani'])
  : null;

export const createArabicFuseSearch = <T>(
  collection: T[],
  keys: string[],
  options: Partial<IFuseOptions<T>> = {},
): Fuse<T> =>
  new Fuse(collection, {
    includeScore: true,
    includeMatches: true,
    threshold: 0.5,
    distance: 100,
    ignoreLocation: true,
    minMatchCharLength: 3,
    useExtendedSearch: true,
    keys,
    ...options,
  });
Fuzzy search configuration:
  • threshold: 0.5 (maximum allowed distance for a match)
  • distance: 100 (how far to search for patterns)
  • ignoreLocation: true (match anywhere in the text)
  • minMatchCharLength: 3 (minimum characters required)
  • Searches both: standard and uthmani text fields
Adaptive threshold:
const hasHighQualityMatches = fuseResults.some(
  (res) => res.score !== undefined && res.score <= 0.25,
);
const cutoff = hasHighQualityMatches ? 0.35 : 0.5;
If high-quality fuzzy matches exist (score ≤ 0.25), the cutoff is tightened to 0.35 to filter out weaker matches.

Combined search flow

The main search() function orchestrates all three layers:
// 1. Run simple search
const simpleMatches = simpleSearch(quranData, cleanQuery, 'standard');

// 2. Run advanced linguistic search (includes fuzzy fallback per token)
const advancedMatches = performAdvancedLinguisticSearch(
  cleanQuery,
  quranData,
  options,
  fuseInstance,
  wordMap,
  morphologyMap,
);

// 3. Combine and deduplicate by gid
const allMatches = [...simpleMatches, ...advancedMatches];
const gidSet = new Set<number>();
const combined: ScoredVerse<TVerse>[] = [];

for (const verse of allMatches) {
  if (!gidSet.has(verse.gid)) {
    gidSet.add(verse.gid);
    combined.push(
      computeScore(verse, cleanQuery, morphologyMap, wordMap, options, mapEntry, fuseMatches),
    );
  }
}

// 4. Sort by relevance
combined.sort((a, b) => b.matchScore - a.matchScore);
Process:
  1. Execute simple search and linguistic search in parallel
  2. Merge results and deduplicate by verse ID (gid)
  3. Compute scores for all matched verses
  4. Sort by score (highest first)
  5. Apply pagination
  6. Return results with metadata (counts, pagination info)
Deduplication ensures each verse appears only once in results, even if it matches through multiple layers.

Why this architecture?

This multi-layered approach provides: Precision: Exact matches score highest
Recall: Linguistic matching finds related forms
Flexibility: Fuzzy search catches typos and variants
Relevance: Scoring prioritizes better matches
Performance: Each layer can be enabled/disabled based on needs
For most use cases, enable all layers with { lemma: true, root: true, fuzzy: true } (fuzzy is enabled by default).

Build docs developers (and LLMs) love