Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/TangibleResearch/Halgorithem/llms.txt

Use this file to discover all available pages before exploring further.

This guide covers using Halgorithem directly in your Python code. You can work at two levels: the low-level Halgorithm class, which handles chunking and comparison without any LLM calls, or the higher-level Engine class, which orchestrates scraping, generation, and verification end-to-end. Both expose the same underlying claim-checking logic.

Choosing an approach

Use Halgorithm when you already have an AI-generated text and want to check it against a set of source documents — no LLM is involved.
from Halgorithem import Halgorithm

algo = Halgorithm(sentences_per_chunk=2, sentence_overlap=1)
docs = algo.load_files(["source.txt"])
results = algo.compare_to_docs(truth_docs=docs, ai_output="...", threshold=0.30)
algo.print_report(results)
load_files() returns a list of dicts with file_id, file_path, and text keys. Pass that list directly to compare_to_docs().print_report() writes a formatted summary to stdout, including confidence score and details on every flagged claim.

Top-level convenience functions

engine.py also exports three module-level functions that share a single internal Engine instance. You can import them directly without instantiating a class:
import engine

# Full pipeline: scrape → generate → verify
result = engine.run(prompt, urls, truth_file_paths, threshold)

# Generate AI output only (no verification)
ai_text = engine.generate(prompt, urls, truth_file_paths)

# Verify existing AI output against sources
verification = engine.verify(ai_output, urls, truth_file_paths, threshold)
Each function accepts urls, truth_file_paths, or both. At least one source must be provided, or a ValueError is raised.

Interpreting results

compare_to_docs() returns a list of claim dicts. Each dict contains:
  • status — one of SUPPORTED, WEAK_SUPPORT, CONTRADICTION, or HALLUCINATION
  • claim — the sentence extracted from the AI output
  • score — cosine similarity score between the claim and the best matching chunk
  • chunk_text — the source chunk that best matched
  • unsupported_terms — proper nouns or numbers in the claim not found in any source
  • reason — set on CONTRADICTION claims (e.g. "Number mismatch", "Negation mismatch")
Filter by status to find only problematic claims:
hallucinations = [c for c in results if c["status"] == "HALLUCINATION"]
contradictions = [c for c in results if c["status"] == "CONTRADICTION"]

for claim in hallucinations:
    print(claim["claim"])
    print("Unsupported terms:", claim.get("unsupported_terms"))
Score thresholds follow this rule: score >= 0.65 is SUPPORTED, score >= threshold (default 0.30) is WEAK_SUPPORT, and anything below is HALLUCINATION.

Passing inline text as truth

compare_to_docs() accepts a plain string or list of strings as truth_docs — you do not need to load files at all. This is useful when your source content is already in memory.
results = algo.compare_to_docs(
    truth_docs="BASIC was developed in 1964 at Dartmouth College.",
    ai_output="BASIC was invented in Germany in 1972.",
    threshold=0.30,
)

Build docs developers (and LLMs) love