GitaChat uses state-of-the-art semantic search to understand the meaning behind your questions, not just keyword matches. When you ask “How do I handle anxiety?”, the system finds verses about inner peace and equanimity—even if those exact words aren’t used.
Vector Embeddings
Converts text into 768-dimensional mathematical representations that capture semantic meaning
Hybrid Search
Combines semantic similarity with keyword matching for optimal accuracy
BGE Model
Uses BAAI/bge-base-en-v1.5, a top-performing embedding model on MTEB benchmarks
Intelligent Ranking
Advanced scoring algorithm that balances semantic relevance with keyword presence
When you submit a question, GitaChat enhances it with an instruction prefix optimized for the BGE model:
# From model.py:38-42def match(query): """Find the best matching verse for a query using semantic search.""" # BGE models work best with instruction prefix for queries query_with_instruction = ( f"Represent this sentence for searching relevant passages: {query}" ) query_embedding = embedding_model.encode(query_with_instruction).tolist()
The instruction prefix "Represent this sentence for searching relevant passages:" is specifically designed for BGE models to improve retrieval accuracy. This is a best practice recommended by the model creators.
The query embedding is compared against all 700+ verse embeddings stored in Pinecone vector database:
# From model.py:44-47# Fetch top 8 matches from Pinecone for hybrid searchresults = index.query( vector=query_embedding, top_k=8, include_metadata=True)
The system retrieves the top 8 candidates to allow for re-ranking in the next phase.
GitaChat implements a sophisticated hybrid approach that combines semantic similarity with keyword matching:
# From model.py:52-67# Build a list of semantic matches with scoressemantic_matches = []for i, match in enumerate(results["matches"]): meta = match["metadata"] semantic_matches.append( { "chapter": meta["chapter"], "verse": meta["verse"], "translation": meta["translation"], "summary": meta.get("summary", ""), "commentary": meta.get("commentary", ""), "semantic_rank": i, "semantic_score": match["score"], "keyword_boost": 0, } )
For each semantic match, the algorithm analyzes keyword overlap:
# From model.py:69-79# Keyword matching: boost results that contain query termsquery_lower = query.lower()query_terms = [term.strip() for term in query_lower.split() if len(term.strip()) > 2]for match in semantic_matches: text = (match["translation"] + " " + match["summary"]).lower() # Count how many query terms appear in the text term_matches = sum(1 for term in query_terms if term in text) if term_matches > 0: # Boost based on term match ratio match["keyword_boost"] = term_matches / len(query_terms) if query_terms else 0
Query terms shorter than 3 characters are filtered out to avoid matching common words like “is”, “to”, “at” that don’t carry semantic weight.
The final score combines semantic similarity (0-1) with a keyword boost (weighted at 15%):
# From model.py:81-87# Re-rank: combine semantic score with keyword boost# semantic_score is typically 0-1, keyword_boost is 0-1for match in semantic_matches: match["combined_score"] = match["semantic_score"] + (match["keyword_boost"] * 0.15)# Sort by combined score (descending)semantic_matches.sort(key=lambda x: x["combined_score"], reverse=True)
Why 15% keyword weight?
Semantic similarity is the primary signal (85%)
Keyword matching provides a secondary boost (15%)
This prevents false negatives when exact terms are present
# From config.py:11-17# Limit CPU threads to prevent contention on shared infrastructureos.environ["TOKENIZERS_PARALLELISM"] = "false"os.environ["OMP_NUM_THREADS"] = "1"os.environ["MKL_NUM_THREADS"] = "1"import torchtorch.set_num_threads(1)
The system is optimized for shared infrastructure by limiting thread usage, preventing resource contention while maintaining fast response times.
# From main.py:115-143@app.post("/api/query", response_model=dict)@limiter.limit("30/minute")async def query_gita(request: Request, query: Query) -> dict: """ Query the Gita with the provided query string(s). Returns verse with contextual commentary tailored to the user's question. """ try: from model import match from utils import generate_contextual_commentary result = match(query.query) if not result: raise HTTPException(status_code=404, detail="No matches found") # Generate contextual commentary that addresses the user's specific question try: contextual = generate_contextual_commentary(query.query, result) result["summarized_commentary"] = contextual except Exception as e: # Fall back to pre-computed summary if OpenAI fails logging.warning(f"Contextual commentary failed, using fallback: {e}") return {"status": "success", "data": result} except HTTPException: raise except Exception as e: logging.error(f"Query error: {type(e).__name__}: {e}") raise HTTPException(status_code=500, detail="Internal Server Error")
Key Features:
Rate limiting: 30 requests per minute per IP
Graceful fallback if contextual commentary generation fails
# From main.py:59-74@asynccontextmanagerasync def lifespan(app: FastAPI): global all_verses_cache # Load all verses from Pinecone on startup logging.info("Loading all verses from Pinecone...") all_verses_cache = load_all_verses_from_pinecone() logging.info(f"Loaded {len(all_verses_cache)} verses") # Load model on startup (before any requests) logging.info("Loading embedding model...") from clients import embedding_model # Warm up the model with a dummy query embedding_model.encode("warmup") logging.info("Model loaded and ready!") yield
This ensures the first user request is just as fast as subsequent ones.