Skip to main content

Overview

The detect_hallucination method is the primary interface for hallucination detection in PAS2. It orchestrates the complete detection pipeline: generating paraphrases, fetching responses, and judging for hallucinations.

Method signature

PAS2.detect_hallucination(query: str, n_paraphrases: int = 3) -> Dict

Parameters

query
str
required
The user query to analyze for potential hallucinations.
n_paraphrases
int
default:"3"
Number of paraphrased versions to generate for comparison.

Return value

return
Dict
Dictionary containing complete detection results with the following keys:
original_query
str
The input query exactly as provided.
original_response
str
The model’s response to the original query.
paraphrased_queries
List[str]
List of paraphrased versions of the query.
paraphrased_responses
List[str]
List of model responses to the paraphrased queries.
hallucination_detected
bool
Whether hallucinations were detected across the responses.
confidence_score
float
Confidence score between 0 and 1 for the detection result.
conflicting_facts
List[Dict[str, Any]]
List of conflicting facts identified by the judge model.
reasoning
str
Detailed explanation of the judgment from the judge model.
summary
str
Concise summary of the hallucination analysis.

Detection pipeline

The method executes three main steps:

Step 1: Generate paraphrases

Calls generate_paraphrases to create semantically equivalent versions of the query.
all_queries = self.generate_paraphrases(query, n_paraphrases)
# Returns: [original_query, paraphrase_1, paraphrase_2, ...]

Step 2: Fetch responses

Gets responses from the model for each query (original + paraphrases).
for i, q in enumerate(all_queries):
    response = self._get_single_response(q, index=i)
    all_responses.append(response)

Step 3: Judge for hallucinations

Analyzes all responses using the judge model to detect inconsistencies.
judgment = self.judge_hallucination(
    original_query=original_query,
    original_response=original_response,
    paraphrased_queries=paraphrased_queries,
    paraphrased_responses=paraphrased_responses
)

Example usage

from pas2 import PAS2

detector = PAS2(
    mistral_api_key="your-mistral-key",
    openai_api_key="your-openai-key"
)

# Detect hallucinations with default paraphrases
results = detector.detect_hallucination(
    query="Who was the first person to land on the moon?"
)

print(f"Query: {results['original_query']}")
print(f"Hallucination detected: {results['hallucination_detected']}")
print(f"Confidence: {results['confidence_score']:.2f}")
print(f"Summary: {results['summary']}")

if results['hallucination_detected']:
    print("\nConflicting facts:")
    for fact in results['conflicting_facts']:
        print(f"  - {fact}")

Example with custom paraphrases

# Generate more paraphrases for higher confidence
results = detector.detect_hallucination(
    query="What is the speed of light?",
    n_paraphrases=5
)

print(f"Analyzed {len(results['paraphrased_queries'])} paraphrases")
print(f"Confidence: {results['confidence_score']}")

Progress tracking

If a progress_callback is registered, the method reports progress through multiple stages:
def track_progress(stage, **kwargs):
    print(f"Stage: {stage}")
    if stage == "paraphrases_complete":
        print(f"  Generated {kwargs['count']} queries")
    elif stage == "responses_progress":
        print(f"  Response {kwargs['completed']}/{kwargs['total']}")

detector = PAS2(
    mistral_api_key="key",
    openai_api_key="key",
    progress_callback=track_progress
)

results = detector.detect_hallucination("Query here")

Progress stages

The method emits the following progress events:
  • starting: Detection process initiated
  • generating_paraphrases: Creating paraphrased queries
  • paraphrases_complete: Paraphrases generated successfully
  • getting_responses: Fetching model responses
  • responses_progress: Individual response received
  • responses_complete: All responses collected
  • judging: Analyzing responses for hallucinations
  • complete: Detection finished
The method processes responses sequentially to provide fine-grained progress updates. For parallel response fetching, use the get_responses method directly.

Error handling

The method is designed to be robust:
  • Paraphrase generation failures trigger fallback paraphrases
  • Individual response errors return error messages for those specific queries
  • Judge model errors return a fallback judgment with confidence_score=0.0
All API calls are logged. Check application logs for detailed error information when detection fails.

Performance considerations

  • Total time: ~10-20 seconds for 3 paraphrases
  • Time breakdown:
    • Paraphrase generation: 1-3 seconds
    • Response fetching: 5-10 seconds (sequential)
    • Judgment: 2-5 seconds
  • Increasing n_paraphrases proportionally increases response fetching time

Build docs developers (and LLMs) love