Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/TangibleResearch/Halgorithem/llms.txt

Use this file to discover all available pages before exploring further.

The Halgorithm class is the core of the library — it handles document loading, chunking, claim extraction, and verification. Import it from Halgorithem.
from Halgorithem import Halgorithm

Constructor

Halgorithm(sentences_per_chunk=2, sentence_overlap=1)
sentences_per_chunk
int
default:"2"
Number of sentences to include in each truth document chunk. Larger values give more context per chunk but reduce granularity.
sentence_overlap
int
default:"1"
Number of sentences to overlap between consecutive chunks. Overlap helps avoid splitting claims across chunk boundaries.

Methods

compare_to_docs

compare_to_docs(truth_docs, ai_output, threshold=0.30)
Verifies every meaningful claim in ai_output against the provided truth documents. This is the primary verification method — it accepts pre-loaded document dicts, raw strings, or lists of strings.
truth_docs
str | list[str] | list[dict]
required
The source-of-truth content to verify against. Accepts:
  • A single string (treated as one inline document)
  • A list of strings (each treated as a separate inline document)
  • A list of dicts with file_id (int), file_path (str), and text (str) keys — the format returned by load_files()
ai_output
str
required
The AI-generated text to verify. Each sentence is extracted and checked independently.
threshold
float
default:"0.30"
Minimum cosine similarity score required to avoid a HALLUCINATION classification. Claims scoring between threshold and 0.65 are classified as WEAK_SUPPORT; claims scoring 0.65 or above are SUPPORTED.
Returns a list of claim result dicts — one per extracted claim. See claim result object reference for the full field list.

compare_to_files

compare_to_files(truth_file_paths, ai_output, threshold=0.30)
Loads files from disk, then runs the same verification pipeline as compare_to_docs. Use this when you have local source files rather than pre-loaded text.
truth_file_paths
list[str]
required
List of file path strings to load as truth documents. Each file is read as UTF-8 text.
ai_output
str
required
The AI-generated text to verify.
threshold
float
default:"0.30"
Minimum cosine similarity score to avoid a HALLUCINATION classification.
Returns the same list of claim result dicts as compare_to_docs.

compare_with_reasoning

compare_with_reasoning(truth_file_paths, ai_output, threshold=0.30)
Alias for compare_to_files — identical signature and return value. Provided for API compatibility.
truth_file_paths
list[str]
required
List of file path strings to load as truth documents.
ai_output
str
required
The AI-generated text to verify.
threshold
float
default:"0.30"
Minimum cosine similarity score to avoid a HALLUCINATION classification.

load_files

load_files(file_paths)
Reads multiple files from disk and returns them as a list of document dicts ready to pass into compare_to_docs.
file_paths
list[str]
required
List of file path strings to load.
Returns a list of dicts, each with:
file_id
int
1-indexed position of the document in the list.
file_path
str
The original file path string as provided.
text
str
Full UTF-8 text content of the file.

load_file

load_file(file_path)
Reads a single file from disk and returns its raw text content.
file_path
str
required
Path to the file to read. Raises FileNotFoundError if the path does not exist, and ValueError if the path is not a file.
Returns a str with the full UTF-8 content of the file.
print_report(results)
Prints a formatted verification report to stdout. The report shows overall confidence, counts of supported/weak/problematic claims, and detailed information for each hallucination or contradiction.
results
list[dict]
required
List of claim result dicts returned by compare_to_docs or compare_to_files.
Returns None. Output format:
Halgorithm Report
================================================================================
Strongly supported: 3  Weak: 1  Issues: 1
Confidence: 75.0%  —  not reliable
================================================================================
Claim #3 | CONTRADICTION | score 0.712

BASIC was developed in 1972.

Reason: Number mismatch
AI numbers: ['1972']  Truth numbers: ['1964']
Unsupported terms: 1972

Closest chunk (sources/basic.txt, chunk 2):
BASIC was developed in 1964 at Dartmouth College by John Kemeny and Thomas Kurtz.

chunk_text

chunk_text(text, doc_id=1, source_name=None)
Splits a document into overlapping sentence chunks and computes embeddings, tokens, entities, and numbers for each chunk. You do not need to call this directly — compare_to_docs calls it internally.
text
str
required
The document text to chunk.
doc_id
int
default:"1"
Identifier to attach to all chunks from this document.
source_name
str
default:"None"
Human-readable label for the source (e.g. a file path or URL) attached to each chunk.
Returns a list of chunk dicts:
doc_id
int
The doc_id value passed to chunk_text.
source_name
str | null
The source_name value passed to chunk_text.
chunk_id
int
1-indexed position of this chunk within the document.
sentence_start
int
1-indexed position of the first sentence in this chunk.
sentence_end
int
1-indexed position of the last sentence in this chunk.
text
str
The raw text of the chunk.
tokens
list[str]
Lowercased tokens extracted from the chunk text, with punctuation and stop words removed (surface forms, not lemmatized).
entities
list[str]
Named entities (proper nouns, organizations, locations, etc.) found in the chunk.
numbers
list[str]
Numeric values found in the chunk text.
embedding
Tensor
Sentence embedding tensor produced by all-MiniLM-L6-v2 via sentence-transformers.

split_sentences

split_sentences(text)
Splits text into individual sentences using pysbd, a sentence boundary detection library. Text is cleaned before segmentation.
text
str
required
The text to split into sentences.
Returns a list[str] of cleaned, non-empty sentence strings.

support_score

support_score(claim, chunk)
Computes the cosine similarity between a claim and a chunk using sentence embeddings. This is the core scoring function used during verification.
claim
str
required
The claim sentence to score.
chunk
dict
required
A chunk dict containing at minimum an embedding key (a sentence embedding tensor, as produced by chunk_text).
Returns a float in the range [-1.0, 1.0] representing the cosine similarity between the claim and the chunk. Higher values indicate stronger semantic alignment.

is_meaningful_claim

is_meaningful_claim(claim)
Filters out trivial or summary sentences that cannot be meaningfully verified — for example, sentences with demonstrative subjects like “This reflects…” or sentences with fewer than 4 tokens. Only claims that pass this filter are included in verification results.
claim
str
required
The sentence to evaluate.
Returns True if the claim is verifiable, False if it should be skipped.

classify_claim_type

classify_claim_type(claim)
Determines whether a claim contains a mathematical expression that can be verified by direct evaluation, or whether it should be verified against source documents.
claim
str
required
The claim sentence to classify.
Returns "MATH" if the claim contains an arithmetic expression (e.g. 2 + 2 = 4, 50%), or "SOURCE" otherwise.

Usage example

from Halgorithem import Halgorithm

algo = Halgorithm(sentences_per_chunk=2, sentence_overlap=1)

# Load from files
docs = algo.load_files(["sources/basic.txt", "sources/basic2.txt"])

# Compare AI output
results = algo.compare_to_docs(
    truth_docs=docs,
    ai_output="BASIC was developed in 1964 at Dartmouth College.",
    threshold=0.30
)

algo.print_report(results)
You can pass a plain string or a list of strings directly to compare_to_docs without calling load_files first. Use load_files only when you want to reuse the same loaded documents across multiple calls.

Build docs developers (and LLMs) love