Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/adelpro/quran-search-engine/llms.txt

Use this file to discover all available pages before exploring further.

The Quran Search Engine supports multi-word queries and uses AND logic to ensure all query tokens are present in matching verses.

Query tokenization

When you search for multiple words, the query is automatically tokenized:
const arabicOnly = query.replace(/[^\u0621-\u064A\s]/g, '').trim();
const cleanQuery = normalizeArabic(arabicOnly);
const queryTokens = cleanQuery.split(/\s+/);
Process:
  1. Strip non-Arabic characters (keeps only Arabic letters and spaces)
  2. Normalize the query (unify alef variants, remove diacritics)
  3. Split by whitespace into individual tokens
Each word in your query becomes a separate token that must be matched independently.

Example tokenization

Input query: الله الرحمن After processing:
queryTokens = ["الله", "الرحمن"]

AND logic implementation

All search layers use AND logic: every token must match for a verse to be included in results.

Simple search AND logic

The simple search layer uses Array.every() to enforce AND logic:
export const simpleSearch = <T extends Record<string, unknown>>(
  items: T[],
  query: string,
  searchField: keyof T,
): T[] => {
  const cleanQuery = normalizeArabic(query.replace(/[^\u0600-\u06FF\s]+/g, '').trim());
  if (!cleanQuery) return [];

  const queryTokens = cleanQuery.split(/\s+/);

  return items.filter((item) => {
    const fieldValue = normalizeArabic(String(item[searchField] || ''));
    // AND logic: All tokens must be present
    return queryTokens.every((token) => fieldValue.includes(token));
  });
};
What this does:
  • For each verse, check if every token appears in the text
  • Only return verses where all tokens are found
  • If even one token is missing, the verse is excluded
The .every() method returns true only if all tokens pass the test. If any token fails, the entire verse is rejected.

Linguistic search AND logic

The advanced linguistic search performs AND logic through set intersection:
const tokens = cleanQuery.split(/\s+/);

// 1. Find matches for EACH token separately
const tokenMatches = tokens.map((token) => {
  const matchingGids = new Set<number>();
  
  // ... find all verses that match this token
  // (via lemma, root, or fuzzy)
  
  return { type: 'linguistic', gids: matchingGids };
});

// 2. Intersect results (AND logic)
if (tokenMatches.length === 0) return [];

// Start with the first token's matches
let intersection = new Set(tokenMatches[0].gids);

// Intersect with each subsequent token
for (let i = 1; i < tokenMatches.length; i++) {
  const currentGids = tokenMatches[i].gids;
  if (currentGids.size === 0) return []; // Short-circuit
  intersection = new Set([...intersection].filter((gid) => currentGids.has(gid)));
  if (intersection.size === 0) return [];
}
Process:
  1. Find matches for each token independently
  2. Start with the first token’s matching verse IDs
  3. Filter to keep only IDs that also match the second token
  4. Continue filtering for each subsequent token
  5. Result: Only verses that match all tokens
The algorithm short-circuits early if any token has zero matches, avoiding unnecessary computation.

Visual example of set intersection

Query: الله الرحمن
Token 1 (الله) matches verses:     [1, 2, 3, 4, 5, 6, 7]
Token 2 (الرحمن) matches verses:   [1, 3, 5, 8, 9]

Intersection (verses with BOTH):   [1, 3, 5]
Only verses 1, 3, and 5 contain both tokens, so only they are returned.

Per-token match types

Each token can match through different layers:
  • Linguistic match: Lemma or root match found for this token
  • Fuzzy match: Fuse.js found an approximate match for this token
const tokenMatches = tokens.map((token) => {
  const entry = wordMap[token];
  const matchingGids = new Set<number>();

  // Try linguistic search first
  if (entry) {
    if (options.lemma && targetLemma) {
      for (const verse of quranData) {
        const morph = morphologyMap.get(verse.gid);
        if (morph?.lemmas.some((lemma) =>
          normalizeArabic(lemma).includes(normalizeArabic(targetLemma))
        )) {
          matchingGids.add(verse.gid);
        }
      }
    }
    // ... similar for roots
  }

  if (matchingGids.size > 0) {
    return { type: 'linguistic', gids: matchingGids };
  }

  // Fallback to fuzzy for this token
  const fuseResults = fuseInstance.search(token);
  // ... process fuzzy results
  
  return { type: 'fuzzy', gids: fuzzyGids, fuseMatches: fuseMatchesMap };
});
Key insight:
  • Each token independently tries linguistic search first
  • If that fails, it falls back to fuzzy search
  • The final results must satisfy all tokens via their respective match types

Multi-word scoring

Scoring accumulates across all matched tokens:
const queryTokens = cleanQuery.split(/\s+/);

// Check each token
for (const token of queryTokens) {
  // 1. Check exact matches for this token
  const textMatches = getPositiveTokens(verse, 'text', undefined, undefined, token, morphologyMap);
  if (textMatches.length > 0) {
    score += textMatches.length * 3;
  }

  // 2. Check lemma matches for this token
  const entry = wordMap[token];
  if (entry?.lemma) {
    const lemmaMatches = getPositiveTokens(verse, 'lemma', entry.lemma, ...);
    if (lemmaMatches.length > 0) {
      score += lemmaMatches.length * 2;
    }
  }

  // ... similar for roots
}

Scoring example

Query: الله الرحمن
Verse: “بسم الله الرحمن الرحيم”
Matches found:
  • Token الله: 1 exact match → +3 points
  • Token الرحمن: 1 exact match → +3 points
Total score: 6
Verses matching more tokens or matching tokens multiple times receive higher scores.

Examples

Example 1: Two-word exact match

import { search } from 'quran-search-engine';

const response = search(
  'الله الرحمن',
  quranData,
  morphologyMap,
  wordMap,
  { lemma: true, root: true }
);

// Results: Only verses containing BOTH الله AND الرحمن
console.log(response.results);
// Example output:
// [
//   { gid: 1, matchScore: 6, matchType: 'exact', ... },
//   { gid: 3, matchScore: 4, matchType: 'lemma', ... },
//   ...
// ]

Example 2: Three-word query

const response = search(
  'الله الرحمن الرحيم',
  quranData,
  morphologyMap,
  wordMap
);

// Results: Only verses with ALL THREE words
// Tokens: ["الله", "الرحمن", "الرحيم"]
// A verse must contain الله AND الرحمن AND الرحيم

Example 3: Mixed match types

Query: صلى محمد Possible results:
  • Verse A: صلى (exact) + محمد (exact) → score: 6, matchType: ‘exact’
  • Verse B: يصلون (lemma for صلى) + محمد (exact) → score: 5, matchType: ‘exact’
  • Verse C: صلاة (root for صلى) + محمد (exact) → score: 4, matchType: ‘exact’
All results contain both tokens, but through different match layers. The matchType reflects the best match quality found in that verse.

Example 4: No results when missing a token

Query: الله قرآن Verse text: “بسم الله الرحمن الرحيم” Result: This verse is excluded because it contains الله but not قرآن. Both tokens must match.

Why AND logic?

AND logic provides: Precision: Results are more specific and relevant
User expectation: Matches natural language search behavior
Reduced noise: Eliminates verses that only partially match
Better UX: Users can refine searches by adding words
To find verses with ANY of your search terms (OR logic), run separate searches and merge the results in your application code.

Common use cases

Searching for phrases

While this isn’t phrase search (word order doesn’t matter), multi-word search effectively finds verses containing all words:
// Find verses about Allah's mercy
const response = search('الله رحمة', quranData, morphologyMap, wordMap);

// Both words must appear (in any order)

Narrowing results

// Broad search
const broad = search('الله', quranData, morphologyMap, wordMap);
console.log(broad.pagination.totalResults); // e.g., 2,800 verses

// Narrowed search
const narrow = search('الله الرحمن', quranData, morphologyMap, wordMap);
console.log(narrow.pagination.totalResults); // e.g., 114 verses
Adding more words reduces the result set to more specific matches.
// Find verses about prayer and fasting
const response = search('صلاة صيام', quranData, morphologyMap, wordMap);

// Both concepts must be present

Performance considerations

The search engine optimizes multi-word queries through:
  1. Early termination: If any token has zero matches, stop immediately
  2. Set operations: Efficient intersection using Set data structure
  3. Deduplication: Verses appear only once even if matched by multiple layers
for (let i = 1; i < tokenMatches.length; i++) {
  const currentGids = tokenMatches[i].gids;
  if (currentGids.size === 0) return []; // Short-circuit
  intersection = new Set([...intersection].filter((gid) => currentGids.has(gid)));
  if (intersection.size === 0) return []; // Stop early
}
For best performance with long queries, put more specific/rare words first. However, the search engine handles token order automatically through optimization.

Build docs developers (and LLMs) love