judge_hallucination method

Overview

The judge_hallucination method uses OpenAI’s o3-mini model as an independent judge to analyze multiple responses and detect hallucinations. It compares the original response with responses to paraphrased queries to identify factual inconsistencies.

Method signature

PAS2.judge_hallucination(
    original_query: str,
    original_response: str,
    paraphrased_queries: List[str],
    paraphrased_responses: List[str]
) -> HallucinationJudgment

Parameters

original_query

str

required

The original user query.

original_response

str

required

The model’s response to the original query.

paraphrased_queries

List[str]

required

List of paraphrased versions of the original query.

paraphrased_responses

List[str]

required

List of model responses corresponding to each paraphrased query.

Return value

return

HallucinationJudgment

A HallucinationJudgment object containing the judge’s analysis. See HallucinationJudgment for details.

Judge model behavior

The method instructs the judge model to:

Analyze all responses for factual inconsistencies
Focus on factual discrepancies, not stylistic differences
Identify cases where different facts are stated for the same question
Provide structured JSON output with detection results

System prompt

The judge uses the following instructions:

You are a judge evaluating whether an AI is hallucinating across different responses to semantically equivalent questions.
Analyze all responses carefully to identify any factual inconsistencies or contradictions.
Focus on factual discrepancies, not stylistic differences.
A hallucination is when the AI states different facts in response to questions that are asking for the same information.

API configuration

The method calls OpenAI’s API with JSON mode:

response = self.openai_client.chat.completions.create(
    model=self.openai_model,  # "o3-mini"
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"Evaluate these responses for hallucinations:\n\n{context}"}
    ],
    response_format={"type": "json_object"}
)

Example usage

from pas2 import PAS2

detector = PAS2(
    mistral_api_key="your-mistral-key",
    openai_api_key="your-openai-key"
)

# Direct usage of judge method
judgment = detector.judge_hallucination(
    original_query="What is the capital of France?",
    original_response="The capital of France is Paris.",
    paraphrased_queries=[
        "Which city is the capital of France?",
        "What city serves as France's capital?"
    ],
    paraphrased_responses=[
        "Paris is the capital city of France.",
        "The capital of France is Lyon."
    ]
)

print(f"Hallucination detected: {judgment.hallucination_detected}")
print(f"Confidence: {judgment.confidence_score}")
print(f"\nReasoning:\n{judgment.reasoning}")
print(f"\nSummary:\n{judgment.summary}")

if judgment.conflicting_facts:
    print("\nConflicting facts:")
    for fact in judgment.conflicting_facts:
        print(f"  {fact}")

Example output

Hallucination detected: True
Confidence: 0.95

Reasoning:
The responses show a clear factual inconsistency regarding the capital of France. The original response and the first paraphrased response correctly state that Paris is the capital, while the second paraphrased response incorrectly claims Lyon is the capital. This represents a hallucination where the model provided conflicting factual information.

Summary:
Hallucination detected with high confidence due to conflicting information about France's capital city.

Context format

The method constructs a detailed context for the judge:

Original Question: {original_query}

Original Response: 
{original_response}

Paraphrased Questions and their Responses:

Paraphrased Question 1: {query_1}

Response 1:
{response_1}

Paraphrased Question 2: {query_2}

Response 2:
{response_2}
...

Fallback judgment

If the judge API call fails, the method returns a safe fallback:

HallucinationJudgment(
    hallucination_detected=False,
    confidence_score=0.0,
    conflicting_facts=[],
    reasoning="Failed to obtain judgment from the model.",
    summary="Analysis failed due to API error."
)

A confidence_score of 0.0 indicates that the judgment could not be obtained. Always check the confidence score before relying on detection results.

Typical response structure

The judge model returns JSON in this format:

{
  "hallucination_detected": true,
  "confidence_score": 0.85,
  "conflicting_facts": [
    {
      "topic": "Moon landing date",
      "conflict": "Response 1 states July 20, Response 2 states July 21"
    }
  ],
  "reasoning": "The model provided inconsistent dates...",
  "summary": "Hallucination detected due to date inconsistency"
}

Performance

Typical response time: 2-5 seconds
Uses OpenAI’s o3-mini model for efficient judgment
Single API call regardless of number of paraphrases

The judge model focuses specifically on factual inconsistencies. Stylistic differences, different phrasings of the same fact, or varying levels of detail are not considered hallucinations.

Integration with detect_hallucination

This method is typically called automatically by detect_hallucination:

# High-level API (recommended)
results = detector.detect_hallucination("Your query here")

# Low-level API (manual control)
queries = detector.generate_paraphrases("Your query here")
responses = detector.get_responses(queries)
judgment = detector.judge_hallucination(
    original_query=queries[0],
    original_response=responses[0],
    paraphrased_queries=queries[1:],
    paraphrased_responses=responses[1:]
)

Core Classes

Methods

judge_hallucination method

Overview

Method signature

Parameters

Return value

Judge model behavior

System prompt

API configuration

Example usage

Example output

Context format

Fallback judgment

Typical response structure

Performance

Integration with detect_hallucination

Build docs developers (and LLMs) love

Core Classes

Methods

​Overview

​Method signature

​Parameters

​Return value

​Judge model behavior

​System prompt

​API configuration

​Example usage

​Example output

​Context format

​Fallback judgment

​Typical response structure

​Performance

​Integration with detect_hallucination

Build docs developers (and LLMs) love

Overview

Method signature

Parameters

Return value

Judge model behavior

System prompt

API configuration

Example usage

Example output

Context format

Fallback judgment

Typical response structure

Performance

Integration with detect_hallucination