Welcome to DocMind
DocMind is a production-grade document RAG (Retrieval-Augmented Generation) system designed for legal firms that need precise, verifiable answers from complex documents. Unlike traditional semantic search systems that return “similar” documents, DocMind strategically retrieves the exact right pages and sections while validating every response against source material.Why DocMind?
Legal documents require absolute precision. A single misinterpreted clause or hallucinated number can have serious consequences. DocMind addresses three critical challenges: Strategic Retrieval - Instead of returning semantically similar content, DocMind analyzes queries to understand intent, routes searches intelligently, and selects specific sections with page numbers that directly answer the question. Hallucination Detection - LLMs are fluent but can fabricate facts. DocMind uses an LLM-as-judge system that extracts factual claims, grounds them in source documents, detects contradictions, and calculates confidence scores before returning any response. Intelligent Orchestration - Built on LangGraph, DocMind orchestrates a multi-stage workflow that decomposes queries, retrieves strategically, generates responses, validates them, and automatically retries if issues are detected.Key Features
Query Decomposition
Extracts intent, entities, constraints, and temporal references from natural language queries using deterministic regex patterns for consistent results.
Agentic Retrieval
Scores and ranks document sections based on intent mapping, entity matching, and query terms. Returns 3-5 most relevant sections with page numbers.
LLM-as-Judge
Multi-phase validation that extracts claims, finds supporting quotes, detects contradictions, and calculates confidence scores to prevent hallucinations.
LangGraph Workflow
State-driven orchestration with automatic retry logic when responses fail validation. Tracks node history and execution paths for observability.
How It Works
DocMind processes every query through a sophisticated pipeline:Query Decomposition
The system analyzes your natural language question to extract intent (payment, IP, indemnification), entities (penalty, late fee), constraints (percentages, timeframes), and temporal references.
Strategic Retrieval
Based on the decomposition, DocMind selects the optimal search strategy (full-text, hybrid, or vector) and scores sections using intent mapping, entity matching, and relevance thresholds.
Response Generation
Retrieved sections are synthesized into a coherent response with specific page number citations for every claim.
Architecture Overview
DocMind is built with clean separation of concerns:- Query decomposition and intent detection
- Strategic section retrieval with relevance scoring
- Response generation with page citations
- Multi-phase validation with claim grounding
- Automatic retry on validation failures (max 2 attempts)
Real-World Performance
DocMind achieves:- 90%+ hallucination detection with structured rubric evaluation
- Sub-2 second latency for most queries using deterministic decomposition
- Zero false positives by distinguishing between contradictions and valid inferences
- Precise section selection with intent-based scoring (5-7 points for intent matches)
DocMind uses regex-based query decomposition instead of LLM calls for speed and determinism. This works well for legal documents with consistent terminology but may need adaptation for domains with more varied vocabulary.
Get Started
Quickstart Guide
Install DocMind and run your first query in under 5 minutes
Core Concepts
Deep dive into query decomposition, agentic retrieval, and LLM validation
API Reference
Explore the complete API documentation
Testing Guide
See how to test DocMind with real legal document queries