Documentation Index
Fetch the complete documentation index at: https://mintlify.com/adelpro/quran-search-engine/llms.txt
Use this file to discover all available pages before exploring further.
The Quran Search Engine supports multi-word queries and uses AND logic to ensure all query tokens are present in matching verses.
Query tokenization
When you search for multiple words, the query is automatically tokenized:
const arabicOnly = query.replace(/[^\u0621-\u064A\s]/g, '').trim();
const cleanQuery = normalizeArabic(arabicOnly);
const queryTokens = cleanQuery.split(/\s+/);
Process:
- Strip non-Arabic characters (keeps only Arabic letters and spaces)
- Normalize the query (unify alef variants, remove diacritics)
- Split by whitespace into individual tokens
Each word in your query becomes a separate token that must be matched independently.
Example tokenization
Input query: الله الرحمن
After processing:
queryTokens = ["الله", "الرحمن"]
AND logic implementation
All search layers use AND logic: every token must match for a verse to be included in results.
Simple search AND logic
The simple search layer uses Array.every() to enforce AND logic:
export const simpleSearch = <T extends Record<string, unknown>>(
items: T[],
query: string,
searchField: keyof T,
): T[] => {
const cleanQuery = normalizeArabic(query.replace(/[^\u0600-\u06FF\s]+/g, '').trim());
if (!cleanQuery) return [];
const queryTokens = cleanQuery.split(/\s+/);
return items.filter((item) => {
const fieldValue = normalizeArabic(String(item[searchField] || ''));
// AND logic: All tokens must be present
return queryTokens.every((token) => fieldValue.includes(token));
});
};
What this does:
- For each verse, check if every token appears in the text
- Only return verses where all tokens are found
- If even one token is missing, the verse is excluded
The .every() method returns true only if all tokens pass the test. If any token fails, the entire verse is rejected.
Linguistic search AND logic
The advanced linguistic search performs AND logic through set intersection:
const tokens = cleanQuery.split(/\s+/);
// 1. Find matches for EACH token separately
const tokenMatches = tokens.map((token) => {
const matchingGids = new Set<number>();
// ... find all verses that match this token
// (via lemma, root, or fuzzy)
return { type: 'linguistic', gids: matchingGids };
});
// 2. Intersect results (AND logic)
if (tokenMatches.length === 0) return [];
// Start with the first token's matches
let intersection = new Set(tokenMatches[0].gids);
// Intersect with each subsequent token
for (let i = 1; i < tokenMatches.length; i++) {
const currentGids = tokenMatches[i].gids;
if (currentGids.size === 0) return []; // Short-circuit
intersection = new Set([...intersection].filter((gid) => currentGids.has(gid)));
if (intersection.size === 0) return [];
}
Process:
- Find matches for each token independently
- Start with the first token’s matching verse IDs
- Filter to keep only IDs that also match the second token
- Continue filtering for each subsequent token
- Result: Only verses that match all tokens
The algorithm short-circuits early if any token has zero matches, avoiding unnecessary computation.
Visual example of set intersection
Query: الله الرحمن
Token 1 (الله) matches verses: [1, 2, 3, 4, 5, 6, 7]
Token 2 (الرحمن) matches verses: [1, 3, 5, 8, 9]
Intersection (verses with BOTH): [1, 3, 5]
Only verses 1, 3, and 5 contain both tokens, so only they are returned.
Per-token match types
Each token can match through different layers:
- Linguistic match: Lemma or root match found for this token
- Fuzzy match: Fuse.js found an approximate match for this token
const tokenMatches = tokens.map((token) => {
const entry = wordMap[token];
const matchingGids = new Set<number>();
// Try linguistic search first
if (entry) {
if (options.lemma && targetLemma) {
for (const verse of quranData) {
const morph = morphologyMap.get(verse.gid);
if (morph?.lemmas.some((lemma) =>
normalizeArabic(lemma).includes(normalizeArabic(targetLemma))
)) {
matchingGids.add(verse.gid);
}
}
}
// ... similar for roots
}
if (matchingGids.size > 0) {
return { type: 'linguistic', gids: matchingGids };
}
// Fallback to fuzzy for this token
const fuseResults = fuseInstance.search(token);
// ... process fuzzy results
return { type: 'fuzzy', gids: fuzzyGids, fuseMatches: fuseMatchesMap };
});
Key insight:
- Each token independently tries linguistic search first
- If that fails, it falls back to fuzzy search
- The final results must satisfy all tokens via their respective match types
Multi-word scoring
Scoring accumulates across all matched tokens:
const queryTokens = cleanQuery.split(/\s+/);
// Check each token
for (const token of queryTokens) {
// 1. Check exact matches for this token
const textMatches = getPositiveTokens(verse, 'text', undefined, undefined, token, morphologyMap);
if (textMatches.length > 0) {
score += textMatches.length * 3;
}
// 2. Check lemma matches for this token
const entry = wordMap[token];
if (entry?.lemma) {
const lemmaMatches = getPositiveTokens(verse, 'lemma', entry.lemma, ...);
if (lemmaMatches.length > 0) {
score += lemmaMatches.length * 2;
}
}
// ... similar for roots
}
Scoring example
Query: الله الرحمن
Verse: “بسم الله الرحمن الرحيم”
Matches found:
- Token
الله: 1 exact match → +3 points
- Token
الرحمن: 1 exact match → +3 points
Total score: 6
Verses matching more tokens or matching tokens multiple times receive higher scores.
Examples
Example 1: Two-word exact match
import { search } from 'quran-search-engine';
const response = search(
'الله الرحمن',
quranData,
morphologyMap,
wordMap,
{ lemma: true, root: true }
);
// Results: Only verses containing BOTH الله AND الرحمن
console.log(response.results);
// Example output:
// [
// { gid: 1, matchScore: 6, matchType: 'exact', ... },
// { gid: 3, matchScore: 4, matchType: 'lemma', ... },
// ...
// ]
Example 2: Three-word query
const response = search(
'الله الرحمن الرحيم',
quranData,
morphologyMap,
wordMap
);
// Results: Only verses with ALL THREE words
// Tokens: ["الله", "الرحمن", "الرحيم"]
// A verse must contain الله AND الرحمن AND الرحيم
Example 3: Mixed match types
Query: صلى محمد
Possible results:
- Verse A:
صلى (exact) + محمد (exact) → score: 6, matchType: ‘exact’
- Verse B:
يصلون (lemma for صلى) + محمد (exact) → score: 5, matchType: ‘exact’
- Verse C:
صلاة (root for صلى) + محمد (exact) → score: 4, matchType: ‘exact’
All results contain both tokens, but through different match layers. The matchType reflects the best match quality found in that verse.
Example 4: No results when missing a token
Query: الله قرآن
Verse text: “بسم الله الرحمن الرحيم”
Result: This verse is excluded because it contains الله but not قرآن. Both tokens must match.
Why AND logic?
AND logic provides:
✓ Precision: Results are more specific and relevant
✓ User expectation: Matches natural language search behavior
✓ Reduced noise: Eliminates verses that only partially match
✓ Better UX: Users can refine searches by adding words
To find verses with ANY of your search terms (OR logic), run separate searches and merge the results in your application code.
Common use cases
Searching for phrases
While this isn’t phrase search (word order doesn’t matter), multi-word search effectively finds verses containing all words:
// Find verses about Allah's mercy
const response = search('الله رحمة', quranData, morphologyMap, wordMap);
// Both words must appear (in any order)
Narrowing results
// Broad search
const broad = search('الله', quranData, morphologyMap, wordMap);
console.log(broad.pagination.totalResults); // e.g., 2,800 verses
// Narrowed search
const narrow = search('الله الرحمن', quranData, morphologyMap, wordMap);
console.log(narrow.pagination.totalResults); // e.g., 114 verses
Adding more words reduces the result set to more specific matches.
Topic-based search
// Find verses about prayer and fasting
const response = search('صلاة صيام', quranData, morphologyMap, wordMap);
// Both concepts must be present
The search engine optimizes multi-word queries through:
- Early termination: If any token has zero matches, stop immediately
- Set operations: Efficient intersection using
Set data structure
- Deduplication: Verses appear only once even if matched by multiple layers
for (let i = 1; i < tokenMatches.length; i++) {
const currentGids = tokenMatches[i].gids;
if (currentGids.size === 0) return []; // Short-circuit
intersection = new Set([...intersection].filter((gid) => currentGids.has(gid)));
if (intersection.size === 0) return []; // Stop early
}
For best performance with long queries, put more specific/rare words first. However, the search engine handles token order automatically through optimization.