Skip to main content

The Problem

LLMs generate fluent text even when making up facts. A response might say “late fee is 5%” when the document says “1.5%” - both sound plausible, but only one is correct. From README.md:29-30:
“The problem: LLMs generate fluent text even when making up facts. We need to validate each claim against source documents.”
The LLMJudge validates every factual claim in the generated response against retrieved document sections.

Implementation

Location: components.py:147-308

Four-Phase Validation

From README.md:31-36:
# Instead of asking "is this response correct?", decompose the problem:

1. Extract each factual claim from the response
2. Find evidence in documents
3. Detect contradictions
4. Calculate confidence

Phase 1: Claim Extraction

Method: _extract_claims(response: str) -> List[Dict] The judge categorizes claims because different types need different validation:

Quantitative Claims

Numbers, percentages, amounts:
# Pattern: Any sentence containing numbers or percentages
number_patterns = re.findall(r'([^.]*\d+(?:\.\d+)?%?[^.]*\.)', response)

for match in number_patterns:
    claims.append({"text": match.strip(), "type": "quantitative"})
Location: components.py:152-155 Examples:
  • “Client shall pay a late fee of 1.5% per month.”
  • “Payment is due within 30 days.”
From README.md:38:
“Quantitative claims (numbers, percentages) are easy to verify and dangerous if wrong. ‘1.5% late fee’ is verifiable; ‘5% late fee’ is a detectable hallucination.”

Temporal Claims

Timeframes, deadlines, durations:
# Pattern: Sentences with time-related keywords
time_patterns = re.findall(
    r'([^.]*(?:within|after|before|\d+\s*days?|\d+\s*months?|\d+\s*years?)[^.]*\.)',
    response,
    re.IGNORECASE
)

for match in time_patterns:
    if match.strip() not in [c["text"] for c in claims]:
        claims.append({"text": match.strip(), "type": "temporal"})
Location: components.py:158-161 Examples:
  • “Either party may terminate upon 30 days’ written notice.”
  • “Confidentiality obligations survive for 3 years.”

Obligation Claims

Contract terms with “shall”, “must”, “will”:
# Pattern: Sentences with obligation keywords
obligation_patterns = re.findall(
    r'([^.]*(?:shall|must|will|is required)[^.]*\.)',
    response,
    re.IGNORECASE
)

for match in obligation_patterns:
    if match.strip() not in [c["text"] for c in claims]:
        claims.append({"text": match.strip(), "type": "obligation"})
Location: components.py:164-167 From README.md:38:
“Obligations (shall/must) indicate contract terms that should exist in the document.”
Examples:
  • “ABC Corporation shall indemnify Client against third-party claims.”
  • “Client must maintain confidentiality of proprietary information.”

General Claims

Fallback for other factual statements:
# If no structured claims found, split by sentences
if not claims:
    sentences = re.split(r'(?<=[.!?])\s+', response)
    for sent in sentences:
        if len(sent) > 20:  # Skip very short sentences
            claims.append({"text": sent.strip(), "type": "general"})
Location: components.py:170-174

Phase 2: Evidence Grounding

Method: _find_supporting_quote(claim: str, context: List[Dict]) -> Optional[str] For each claim, search retrieved sections for supporting evidence.

Number Matching (Strict)

For quantitative claims, exact number matches are required:
numbers_in_claim = re.findall(r'\d+(?:\.\d+)?%?', claim)

for section in context:
    content = section.get("content", "")
    numbers_in_content = re.findall(r'\d+(?:\.\d+)?%?', content)
    
    for num in numbers_in_claim:
        if num in numbers_in_content:
            # Find the sentence containing this number
            sentences = re.split(r'(?<=[.!?])\s+', content)
            for sent in sentences:
                if num in sent:
                    return sent.strip()  # Supporting evidence found
Location: components.py:188-195
Strict number matching prevents subtle hallucinations like “1.5%” becoming “5%” or “30 days” becoming “60 days”.

Key Phrase Matching (Flexible)

For non-quantitative claims, use word overlap:
# Extract significant words (4+ characters)
claim_words = set(re.findall(r'\b\w{4,}\b', claim_lower))
content_words = set(re.findall(r'\b\w{4,}\b', content_lower))
overlap = claim_words & content_words

if len(overlap) >= 3:  # Require at least 3 matching words
    sentences = re.split(r'(?<=[.!?])\s+', content)
    for sent in sentences:
        sent_words = set(re.findall(r'\b\w{4,}\b', sent.lower()))
        if len(claim_words & sent_words) >= 2:
            return sent.strip()
Location: components.py:198-207

Phase 3: Contradiction Detection

Method: _check_contradiction(claim: str, context: List[Dict]) -> bool From README.md:40-41:
“Contradiction detection compares numbers in claims against numbers in context. If the response says ‘late fee is 5%’ but the document says ‘late fee of 1.5%’, that’s a contradiction.”

Percentage Contradictions

claim_percentages = re.findall(r'(\d+(?:\.\d+)?)\s*%', claim)

for section in context:
    content = section.get("content", "")
    content_percentages = re.findall(r'(\d+(?:\.\d+)?)\s*%', content)
    
    if content_percentages:
        # Check if discussing the same topic
        if ("late" in claim_lower or "fee" in claim_lower) and \
           ("late" in content_lower or "fee" in content_lower):
            for claim_pct in claim_percentages:
                if claim_pct not in content_percentages:
                    return True  # CONTRADICTION!
Location: components.py:221-228 From README.md:41:
“I check that both discuss the same topic (late/fee keywords) before flagging.”

Day/Timeline Contradictions

claim_days = re.findall(r'(\d+)\s*days?', claim_lower)

for section in context:
    content = section.get("content", "")
    content_days_list = re.findall(r'(\d+)\s*\)?\s*days?', content_lower)
    
    if content_days_list:
        payment_keywords = ["payment", "pay", "due", "within", "invoice", "receipt"]
        claim_has_payment = any(kw in claim_lower for kw in payment_keywords)
        content_has_payment = any(kw in content_lower for kw in payment_keywords)
        
        if claim_has_payment and content_has_payment:
            for claim_d in claim_days:
                if claim_d not in content_days_list:
                    return True  # CONTRADICTION!
Location: components.py:231-240
Context-aware contradiction detection prevents false positives. “30 days” in payment terms vs “3 years” in confidentiality is NOT a contradiction because they discuss different topics.

Phase 4: Confidence Scoring

Method: evaluate(response: str, context: List[Dict]) -> Dict From README.md:43-44:
# Weighted scoring system:

confidence_score = 1.0
confidence_score -= (contradicted_count / total_claims) * 0.8  # Heavy penalty
confidence_score -= (unsupported_count / total_claims) * 0.3   # Moderate penalty
confidence_score = max(0.0, min(1.0, confidence_score))
Location: components.py:285-288

Penalty Weights

StatusPenaltyRationale
Contradicted-0.8 per claimStating something provably wrong is severe
Unsupported-0.3 per claimMight be valid inference, less severe
Supported0.0No penalty for correct claims
From README.md:43-44:
“Confidence scoring: contradictions get heavy penalty (0.8 per claim) because stating something wrong is serious. Unsupported claims get moderate penalty (0.3) because they might be valid inferences.”

Decision Threshold

is_hallucinated = contradicted_count > 0 or confidence_score < 0.5
Location: components.py:291 From README.md:44:
“Threshold at 0.5 means if more than half the claims are problematic, the response is rejected.”

Verdict Structure

The judge returns structured JSON:
{
    "claims": [
        {
            "text": "Client shall pay invoices within 30 days.",
            "type": "temporal",
            "found_in_source": True,
            "source_quote": "Client shall pay invoices within thirty (30) days of receipt.",
            "status": "supported"
        },
        {
            "text": "Late fee is 5% per month.",
            "type": "quantitative",
            "found_in_source": False,
            "source_quote": None,
            "status": "contradicted"
        }
    ],
    "confidence_score": 0.15,
    "is_hallucinated": True,
    "should_return": False,
    "summary": {
        "total_claims": 2,
        "supported": 1,
        "unsupported": 0,
        "contradicted": 1
    },
    "reasoning": "Found 1 supported, 0 unsupported, 1 contradicted claims."
}
Location: components.py:293-308

Example: Detecting Hallucination

Generated Response

"The late payment fee is 5% per month. Payment is due within 30 days."

Retrieved Context

Section: Late Payment Penalties (page 8)
"If payment is not received within thirty (30) days, Client shall be assessed
a late fee of 1.5% per month (18% annually) on the outstanding balance."

Judge Evaluation

Claim 1: “The late payment fee is 5% per month.”
  • Type: quantitative
  • Numbers in claim: [“5”]
  • Numbers in context: [“30”, “1.5”, “18”]
  • Topic match: Both mention “late” and “fee”
  • Status: CONTRADICTED (5% ≠ 1.5%)
Claim 2: “Payment is due within 30 days.”
  • Type: temporal
  • Numbers in claim: [“30”]
  • Numbers in context: [“30”, “1.5”, “18”]
  • Supporting quote: “payment is not received within thirty (30) days”
  • Status: SUPPORTED

Confidence Calculation

total_claims = 2
contradicted_count = 1
supported_count = 1
unsupported_count = 0

confidence_score = 1.0
confidence_score -= (1 / 2) * 0.8  # -0.4 for contradicted claim
confidence_score -= (0 / 2) * 0.3  # -0.0 for unsupported
confidence_score = 0.6

is_hallucinated = True  # contradicted_count > 0
should_return = False
Even though confidence is 0.6 (above 0.5), ANY contradicted claim triggers is_hallucinated = True.

Integration with Workflow

# nodes.py:29-37
async def judge_node(state: DocMindState) -> DocMindState:
    judge = LLMJudge()
    verdict = await judge.evaluate(state["generated_response"], state["retrieved_sections"])
    state["judge_verdict"] = verdict
    state["node_history"] = state.get("node_history", []) + ["judge"]
    
    # Increment retry count if hallucinated
    if verdict.get("is_hallucinated", False):
        state["retry_count"] = state.get("retry_count", 0) + 1
    
    return state
Location: nodes.py:29-37

Retry Logic

# nodes.py:47-55
def should_retry(state: DocMindState) -> str:
    verdict = state.get("judge_verdict", {})
    retry_count = state.get("retry_count", 0)
    
    # Retry if hallucinated and haven't exceeded max retries (2 attempts max)
    if verdict.get("is_hallucinated", False) and retry_count < 2:
        log_retry_attempt(retry_count + 1, 2)
        return "retry"
    return "output"
Location: nodes.py:47-55 From README.md:47-48:
“If the judge detects hallucination, the system retries retrieval. Maximum 2 retries to avoid infinite loops. If retrieval fails twice, the information probably doesn’t exist in the documents.”

Design Limitations

From README.md:61:
“Judge makes single-pass decisions without revision.”
In production, you could:
  • Allow the judge to request additional context
  • Support multi-hop reasoning across sections
  • Implement chain-of-thought validation
  • Use an LLM for more nuanced claim extraction

Testing

from components import LLMJudge

judge = LLMJudge()

# Test claim extraction
response = "The late fee is 1.5% per month. Payment is due within 30 days."
claims = judge._extract_claims(response)
assert len(claims) == 2
assert claims[0]["type"] == "quantitative"
assert claims[1]["type"] == "temporal"

# Test contradiction detection
claim = "Late fee is 5% per month"
context = [{"content": "late fee of 1.5% per month"}]
assert judge._check_contradiction(claim, context) == True

# Test evaluation
verdict = await judge.evaluate(response, context)
assert verdict["is_hallucinated"] == True
assert verdict["confidence_score"] < 0.5

Next Steps

Agentic Retrieval

Understand how sections are retrieved

LangGraph Workflow

See the complete orchestration

Build docs developers (and LLMs) love