Halgorithm class: methods and parameters reference

The Halgorithm class is the core of the library — it handles document loading, chunking, claim extraction, and verification. Import it from Halgorithem.

from Halgorithem import Halgorithm

Constructor

Halgorithm(sentences_per_chunk=2, sentence_overlap=1)

sentences_per_chunk

int

default:"2"

Number of sentences to include in each truth document chunk. Larger values give more context per chunk but reduce granularity.

sentence_overlap

int

default:"1"

Number of sentences to overlap between consecutive chunks. Overlap helps avoid splitting claims across chunk boundaries.

Methods

compare_to_docs

compare_to_docs(truth_docs, ai_output, threshold=0.30)

Verifies every meaningful claim in ai_output against the provided truth documents. This is the primary verification method — it accepts pre-loaded document dicts, raw strings, or lists of strings.

truth_docs

str | list[str] | list[dict]

required

The source-of-truth content to verify against. Accepts:

A single string (treated as one inline document)
A list of strings (each treated as a separate inline document)
A list of dicts with file_id (int), file_path (str), and text (str) keys — the format returned by load_files()

ai_output

str

required

The AI-generated text to verify. Each sentence is extracted and checked independently.

threshold

float

default:"0.30"

Minimum cosine similarity score required to avoid a HALLUCINATION classification. Claims scoring between threshold and 0.65 are classified as WEAK_SUPPORT; claims scoring 0.65 or above are SUPPORTED.

Returns a list of claim result dicts — one per extracted claim. See claim result object reference for the full field list.

compare_to_files

compare_to_files(truth_file_paths, ai_output, threshold=0.30)

Loads files from disk, then runs the same verification pipeline as compare_to_docs. Use this when you have local source files rather than pre-loaded text.

truth_file_paths

list[str]

required

List of file path strings to load as truth documents. Each file is read as UTF-8 text.

ai_output

str

required

The AI-generated text to verify.

threshold

float

default:"0.30"

Minimum cosine similarity score to avoid a HALLUCINATION classification.

Returns the same list of claim result dicts as compare_to_docs.

compare_with_reasoning

compare_with_reasoning(truth_file_paths, ai_output, threshold=0.30)

Alias for compare_to_files — identical signature and return value. Provided for API compatibility.

truth_file_paths

list[str]

required

List of file path strings to load as truth documents.

ai_output

str

required

The AI-generated text to verify.

threshold

float

default:"0.30"

Minimum cosine similarity score to avoid a HALLUCINATION classification.

load_files

load_files(file_paths)

Reads multiple files from disk and returns them as a list of document dicts ready to pass into compare_to_docs.

file_paths

list[str]

required

List of file path strings to load.

Returns a list of dicts, each with:

file_id

int

1-indexed position of the document in the list.

file_path

str

The original file path string as provided.

text

str

Full UTF-8 text content of the file.

load_file

load_file(file_path)

Reads a single file from disk and returns its raw text content.

file_path

str

required

Path to the file to read. Raises FileNotFoundError if the path does not exist, and ValueError if the path is not a file.

Returns a str with the full UTF-8 content of the file.

print_report

print_report(results)

Prints a formatted verification report to stdout. The report shows overall confidence, counts of supported/weak/problematic claims, and detailed information for each hallucination or contradiction.

results

list[dict]

required

List of claim result dicts returned by compare_to_docs or compare_to_files.

Returns None. Output format:

Halgorithm Report
================================================================================
Strongly supported: 3  Weak: 1  Issues: 1
Confidence: 75.0%  —  not reliable
================================================================================
Claim #3 | CONTRADICTION | score 0.712

BASIC was developed in 1972.

Reason: Number mismatch
AI numbers: ['1972']  Truth numbers: ['1964']
Unsupported terms: 1972

Closest chunk (sources/basic.txt, chunk 2):
BASIC was developed in 1964 at Dartmouth College by John Kemeny and Thomas Kurtz.

chunk_text

chunk_text(text, doc_id=1, source_name=None)

Splits a document into overlapping sentence chunks and computes embeddings, tokens, entities, and numbers for each chunk. You do not need to call this directly — compare_to_docs calls it internally.

text

str

required

The document text to chunk.

doc_id

int

default:"1"

Identifier to attach to all chunks from this document.

source_name

str

default:"None"

Human-readable label for the source (e.g. a file path or URL) attached to each chunk.

Returns a list of chunk dicts:

doc_id

int

The doc_id value passed to chunk_text.

source_name

str | null

The source_name value passed to chunk_text.

chunk_id

int

1-indexed position of this chunk within the document.

sentence_start

int

1-indexed position of the first sentence in this chunk.

sentence_end

int

1-indexed position of the last sentence in this chunk.

text

str

The raw text of the chunk.

tokens

list[str]

Lowercased tokens extracted from the chunk text, with punctuation and stop words removed (surface forms, not lemmatized).

entities

list[str]

Named entities (proper nouns, organizations, locations, etc.) found in the chunk.

numbers

list[str]

Numeric values found in the chunk text.

embedding

Tensor

Sentence embedding tensor produced by all-MiniLM-L6-v2 via sentence-transformers.

split_sentences

split_sentences(text)

Splits text into individual sentences using pysbd, a sentence boundary detection library. Text is cleaned before segmentation.

text

str

required

The text to split into sentences.

Returns a list[str] of cleaned, non-empty sentence strings.

support_score

support_score(claim, chunk)

Computes the cosine similarity between a claim and a chunk using sentence embeddings. This is the core scoring function used during verification.

claim

str

required

The claim sentence to score.

chunk

dict

required

A chunk dict containing at minimum an embedding key (a sentence embedding tensor, as produced by chunk_text).

Returns a float in the range [-1.0, 1.0] representing the cosine similarity between the claim and the chunk. Higher values indicate stronger semantic alignment.

is_meaningful_claim

is_meaningful_claim(claim)

Filters out trivial or summary sentences that cannot be meaningfully verified — for example, sentences with demonstrative subjects like “This reflects…” or sentences with fewer than 4 tokens. Only claims that pass this filter are included in verification results.

claim

str

required

The sentence to evaluate.

Returns True if the claim is verifiable, False if it should be skipped.

classify_claim_type

classify_claim_type(claim)

Determines whether a claim contains a mathematical expression that can be verified by direct evaluation, or whether it should be verified against source documents.

claim

str

required

The claim sentence to classify.

Returns "MATH" if the claim contains an arithmetic expression (e.g. 2 + 2 = 4, 50%), or "SOURCE" otherwise.

Usage example

from Halgorithem import Halgorithm

algo = Halgorithm(sentences_per_chunk=2, sentence_overlap=1)

# Load from files
docs = algo.load_files(["sources/basic.txt", "sources/basic2.txt"])

# Compare AI output
results = algo.compare_to_docs(
    truth_docs=docs,
    ai_output="BASIC was developed in 1964 at Dartmouth College.",
    threshold=0.30
)

algo.print_report(results)

You can pass a plain string or a list of strings directly to compare_to_docs without calling load_files first. Use load_files only when you want to reuse the same loaded documents across multiple calls.

Core API

Modules

Halgorithm class: methods and parameters reference

Constructor

Methods

compare_to_docs

compare_to_files

compare_with_reasoning

load_files

load_file

print_report

chunk_text

split_sentences

support_score

is_meaningful_claim

classify_claim_type

Usage example

Build docs developers (and LLMs) love

Core API

Modules

Documentation Index

​Constructor

​Methods

​compare_to_docs

​compare_to_files

​compare_with_reasoning

​load_files

​load_file

​print_report

​chunk_text

​split_sentences

​support_score

​is_meaningful_claim

​classify_claim_type

​Usage example

Build docs developers (and LLMs) love

Constructor

Methods

compare_to_docs

compare_to_files

compare_with_reasoning

load_files

load_file

print_report

chunk_text

split_sentences

support_score

is_meaningful_claim

classify_claim_type

Usage example