Multi-word search

The Quran Search Engine supports multi-word queries and uses AND logic to ensure all query tokens are present in matching verses.

Query tokenization

When you search for multiple words, the query is automatically tokenized:

const arabicOnly = query.replace(/[^\u0621-\u064A\s]/g, '').trim();
const cleanQuery = normalizeArabic(arabicOnly);
const queryTokens = cleanQuery.split(/\s+/);

Process:

Strip non-Arabic characters (keeps only Arabic letters and spaces)
Normalize the query (unify alef variants, remove diacritics)
Split by whitespace into individual tokens

Each word in your query becomes a separate token that must be matched independently.

Example tokenization

Input query: الله الرحمن After processing:

queryTokens = ["الله", "الرحمن"]

AND logic implementation

All search layers use AND logic: every token must match for a verse to be included in results.

Simple search AND logic

The simple search layer uses Array.every() to enforce AND logic:

export const simpleSearch = <T extends Record<string, unknown>>(
  items: T[],
  query: string,
  searchField: keyof T,
): T[] => {
  const cleanQuery = normalizeArabic(query.replace(/[^\u0600-\u06FF\s]+/g, '').trim());
  if (!cleanQuery) return [];

  const queryTokens = cleanQuery.split(/\s+/);

  return items.filter((item) => {
    const fieldValue = normalizeArabic(String(item[searchField] || ''));
    // AND logic: All tokens must be present
    return queryTokens.every((token) => fieldValue.includes(token));
  });
};

What this does:

For each verse, check if every token appears in the text
Only return verses where all tokens are found
If even one token is missing, the verse is excluded

The .every() method returns true only if all tokens pass the test. If any token fails, the entire verse is rejected.

Linguistic search AND logic

The advanced linguistic search performs AND logic through set intersection:

const tokens = cleanQuery.split(/\s+/);

// 1. Find matches for EACH token separately
const tokenMatches = tokens.map((token) => {
  const matchingGids = new Set<number>();
  
  // ... find all verses that match this token
  // (via lemma, root, or fuzzy)
  
  return { type: 'linguistic', gids: matchingGids };
});

// 2. Intersect results (AND logic)
if (tokenMatches.length === 0) return [];

// Start with the first token's matches
let intersection = new Set(tokenMatches[0].gids);

// Intersect with each subsequent token
for (let i = 1; i < tokenMatches.length; i++) {
  const currentGids = tokenMatches[i].gids;
  if (currentGids.size === 0) return []; // Short-circuit
  intersection = new Set([...intersection].filter((gid) => currentGids.has(gid)));
  if (intersection.size === 0) return [];
}

Process:

Find matches for each token independently
Start with the first token’s matching verse IDs
Filter to keep only IDs that also match the second token
Continue filtering for each subsequent token
Result: Only verses that match all tokens

The algorithm short-circuits early if any token has zero matches, avoiding unnecessary computation.

Visual example of set intersection

Query: الله الرحمن

Token 1 (الله) matches verses:     [1, 2, 3, 4, 5, 6, 7]
Token 2 (الرحمن) matches verses:   [1, 3, 5, 8, 9]

Intersection (verses with BOTH):   [1, 3, 5]

Only verses 1, 3, and 5 contain both tokens, so only they are returned.

Per-token match types

Each token can match through different layers:

Linguistic match: Lemma or root match found for this token
Fuzzy match: Fuse.js found an approximate match for this token

const tokenMatches = tokens.map((token) => {
  const entry = wordMap[token];
  const matchingGids = new Set<number>();

  // Try linguistic search first
  if (entry) {
    if (options.lemma && targetLemma) {
      for (const verse of quranData) {
        const morph = morphologyMap.get(verse.gid);
        if (morph?.lemmas.some((lemma) =>
          normalizeArabic(lemma).includes(normalizeArabic(targetLemma))
        )) {
          matchingGids.add(verse.gid);
        }
      }
    }
    // ... similar for roots
  }

  if (matchingGids.size > 0) {
    return { type: 'linguistic', gids: matchingGids };
  }

  // Fallback to fuzzy for this token
  const fuseResults = fuseInstance.search(token);
  // ... process fuzzy results
  
  return { type: 'fuzzy', gids: fuzzyGids, fuseMatches: fuseMatchesMap };
});

Key insight:

Each token independently tries linguistic search first
If that fails, it falls back to fuzzy search
The final results must satisfy all tokens via their respective match types

Multi-word scoring

Scoring accumulates across all matched tokens:

const queryTokens = cleanQuery.split(/\s+/);

// Check each token
for (const token of queryTokens) {
  // 1. Check exact matches for this token
  const textMatches = getPositiveTokens(verse, 'text', undefined, undefined, token, morphologyMap);
  if (textMatches.length > 0) {
    score += textMatches.length * 3;
  }

  // 2. Check lemma matches for this token
  const entry = wordMap[token];
  if (entry?.lemma) {
    const lemmaMatches = getPositiveTokens(verse, 'lemma', entry.lemma, ...);
    if (lemmaMatches.length > 0) {
      score += lemmaMatches.length * 2;
    }
  }

  // ... similar for roots
}

Scoring example

Query: الله الرحمن
Verse: “بسم الله الرحمن الرحيم” Matches found:

Token الله: 1 exact match → +3 points
Token الرحمن: 1 exact match → +3 points

Total score: 6

Verses matching more tokens or matching tokens multiple times receive higher scores.

Examples

Example 1: Two-word exact match

import { search } from 'quran-search-engine';

const response = search(
  'الله الرحمن',
  quranData,
  morphologyMap,
  wordMap,
  { lemma: true, root: true }
);

// Results: Only verses containing BOTH الله AND الرحمن
console.log(response.results);
// Example output:
// [
//   { gid: 1, matchScore: 6, matchType: 'exact', ... },
//   { gid: 3, matchScore: 4, matchType: 'lemma', ... },
//   ...
// ]

Example 2: Three-word query

const response = search(
  'الله الرحمن الرحيم',
  quranData,
  morphologyMap,
  wordMap
);

// Results: Only verses with ALL THREE words
// Tokens: ["الله", "الرحمن", "الرحيم"]
// A verse must contain الله AND الرحمن AND الرحيم

Example 3: Mixed match types

Query: صلى محمد Possible results:

Verse A: صلى (exact) + محمد (exact) → score: 6, matchType: ‘exact’
Verse B: يصلون (lemma for صلى) + محمد (exact) → score: 5, matchType: ‘exact’
Verse C: صلاة (root for صلى) + محمد (exact) → score: 4, matchType: ‘exact’

All results contain both tokens, but through different match layers. The matchType reflects the best match quality found in that verse.

Example 4: No results when missing a token

Query: الله قرآن Verse text: “بسم الله الرحمن الرحيم” Result: This verse is excluded because it contains الله but not قرآن. Both tokens must match.

Why AND logic?

AND logic provides: ✓ Precision: Results are more specific and relevant
✓ User expectation: Matches natural language search behavior
✓ Reduced noise: Eliminates verses that only partially match
✓ Better UX: Users can refine searches by adding words

To find verses with ANY of your search terms (OR logic), run separate searches and merge the results in your application code.

Common use cases

Searching for phrases

While this isn’t phrase search (word order doesn’t matter), multi-word search effectively finds verses containing all words:

// Find verses about Allah's mercy
const response = search('الله رحمة', quranData, morphologyMap, wordMap);

// Both words must appear (in any order)

Narrowing results

// Broad search
const broad = search('الله', quranData, morphologyMap, wordMap);
console.log(broad.pagination.totalResults); // e.g., 2,800 verses

// Narrowed search
const narrow = search('الله الرحمن', quranData, morphologyMap, wordMap);
console.log(narrow.pagination.totalResults); // e.g., 114 verses

Adding more words reduces the result set to more specific matches.

Topic-based search

// Find verses about prayer and fasting
const response = search('صلاة صيام', quranData, morphologyMap, wordMap);

// Both concepts must be present

Performance considerations

The search engine optimizes multi-word queries through:

Early termination: If any token has zero matches, stop immediately
Set operations: Efficient intersection using Set data structure
Deduplication: Verses appear only once even if matched by multiple layers

for (let i = 1; i < tokenMatches.length; i++) {
  const currentGids = tokenMatches[i].gids;
  if (currentGids.size === 0) return []; // Short-circuit
  intersection = new Set([...intersection].filter((gid) => currentGids.has(gid)));
  if (intersection.size === 0) return []; // Stop early
}

For best performance with long queries, put more specific/rare words first. However, the search engine handles token order automatically through optimization.

Get Started

Core Concepts

Guides

Examples

Query tokenization

Example tokenization

AND logic implementation

Simple search AND logic

Linguistic search AND logic

Visual example of set intersection

Per-token match types

Multi-word scoring

Scoring example

Examples

Example 1: Two-word exact match

Example 2: Three-word query

Example 3: Mixed match types

Example 4: No results when missing a token

Why AND logic?

Common use cases

Searching for phrases

Narrowing results

Topic-based search

Performance considerations

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Examples

Documentation Index

​Query tokenization

​Example tokenization

​AND logic implementation

​Simple search AND logic

​Linguistic search AND logic

​Visual example of set intersection

​Per-token match types

​Multi-word scoring

​Scoring example

​Examples

​Example 1: Two-word exact match

​Example 2: Three-word query

​Example 3: Mixed match types

​Example 4: No results when missing a token

​Why AND logic?

​Common use cases

​Searching for phrases

​Narrowing results

​Topic-based search

​Performance considerations

Build docs developers (and LLMs) love

Query tokenization

Example tokenization

AND logic implementation

Simple search AND logic

Linguistic search AND logic

Visual example of set intersection

Per-token match types

Multi-word scoring

Scoring example

Examples

Example 1: Two-word exact match

Example 2: Three-word query

Example 3: Mixed match types

Example 4: No results when missing a token

Why AND logic?

Common use cases

Searching for phrases

Narrowing results

Topic-based search

Performance considerations