Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/TangibleResearch/Halgorithem/llms.txt

Use this file to discover all available pages before exploring further.

The Engine class in engine.py provides a high-level pipeline that combines web scraping, AI text generation via OpenAI, and claim verification. It also exposes three module-level convenience functions that delegate to a shared _engine instance so you can use the module directly without instantiating Engine yourself.
Engine and the module-level functions require a valid OPENAI_API_KEY environment variable. Set it before calling generate or run.
export OPENAI_API_KEY="sk-..."
To change the default OpenAI model, set the OPENAI_MODEL environment variable before importing the module. If the variable is not set, the model defaults to "gpt-4o".
export OPENAI_MODEL="gpt-4o-mini"

Constructor

Engine(model=DEFAULT_MODEL, sentences_per_chunk=2, sentence_overlap=1, timeout=10)
model
str
OpenAI model name to use for generation. Defaults to the value of the OPENAI_MODEL environment variable, falling back to "gpt-4o" if the variable is not set.
sentences_per_chunk
int
default:"2"
Number of sentences per chunk, passed directly to the underlying Halgorithm instance.
sentence_overlap
int
default:"1"
Number of sentences to overlap between consecutive chunks, passed to Halgorithm.
timeout
int
default:"10"
Stored on the instance but not currently forwarded to WebScraper. Web scraping uses a hardcoded 5-second timeout per request.

Methods

run

run(prompt, urls=None, truth_file_paths=None, threshold=0.30)
Executes the full pipeline in a single call: loads sources (scraping URLs and/or reading local files), generates an AI response grounded in those sources, then verifies each claim in the response against the same sources. Raises ValueError if neither urls nor truth_file_paths produce any source documents.
prompt
str
required
The question or instruction to send to the OpenAI model.
urls
list[str]
default:"None"
List of URLs to scrape as truth sources. Pages are scraped to plain text before being passed to the model and verifier.
truth_file_paths
list[str]
default:"None"
List of local file paths to load as truth sources. Can be combined with urls.
threshold
float
default:"0.30"
Minimum cosine similarity score passed through to Halgorithm.compare_to_docs.
Returns a dict:
claims
list[dict]
List of claim result dicts — one per extracted claim. See claim result object reference.
summary
str
Human-readable summary string, e.g. "3/4 supported, 1/4 weak, 0/4 contradictions, 0/4 hallucinations".
ai_output
str
The raw text generated by the OpenAI model.
sources
list[str]
List of file paths and/or URLs that were used as truth sources.
# Example return value
{
    "claims": [...],
    "summary": "3/4 supported, 1/4 weak, 0 contradictions, 0 hallucinations",
    "ai_output": "BASIC was invented at Dartmouth College in 1964...",
    "sources": ["https://en.wikipedia.org/wiki/BASIC", "sources/basic.txt"]
}

generate

generate(prompt, source_docs=None)
Sends prompt to the OpenAI model, optionally grounded by source_docs. When source documents are provided, the model is instructed not to add facts beyond what is present in those documents.
prompt
str
required
The question or instruction to send to the model.
source_docs
list[dict]
default:"None"
List of source document dicts with file_path and text keys — the format returned by scrape_urls() or load_truth_files(). When omitted, the model answers from its own training data.
Returns a str containing the model’s response text.

verify

verify(ai_output, source_docs, threshold=0.30)
Runs claim verification against pre-loaded source documents without calling OpenAI. Use this when you already have an AI-generated text and want to check it independently of generation.
ai_output
str
required
The AI-generated text to verify.
source_docs
list[dict]
required
List of source document dicts (with file_id, file_path, text keys) to verify against.
threshold
float
default:"0.30"
Minimum cosine similarity score to avoid a HALLUCINATION classification.
Returns a dict with claims (list of claim result dicts) and summary (str).

scrape_urls

scrape_urls(urls)
Scrapes a list of URLs to plain text using WebScraper and returns them as source document dicts. Pages are written to a temporary directory during scraping and cleaned up automatically.
urls
Iterable[str]
required
URLs to scrape. A warning is printed for any URL that fails to scrape; failed URLs are omitted from the result.
Returns a list of dicts, each with:
file_id
int
1-indexed position of the URL in the input list.
file_path
str
The original URL string.
text
str
Plain text content scraped from the page.

load_truth_files

load_truth_files(file_paths)
Loads local files from disk by delegating to Halgorithm.load_files(). Returns the same format as scrape_urls so that both can be combined as truth sources.
file_paths
Iterable[str]
required
File paths to load.
Returns a list of dicts with file_id, file_path, and text keys.

Module-level functions

The module exposes three top-level convenience functions that use a shared Engine() instance created at import time. You do not need to instantiate Engine to use them.

engine.run

import engine

result = engine.run(prompt, urls=None, truth_file_paths=None, threshold=0.30)
Delegates to Engine.run() on the shared instance. Raises ValueError if no sources are provided.
prompt
str
required
The question or instruction for the model.
urls
list[str]
default:"None"
URLs to scrape as truth sources.
truth_file_paths
list[str]
default:"None"
Local file paths to load as truth sources.
threshold
float
default:"0.30"
Minimum cosine similarity for claim classification.
Returns the same dict as Engine.run().

engine.generate

import engine

text = engine.generate(prompt, urls=None, truth_file_paths=None)
Loads sources from urls and/or truth_file_paths, then generates a grounded response. If no sources are provided, the model answers from its own training data.
prompt
str
required
The question or instruction for the model.
urls
list[str]
default:"None"
URLs to scrape as grounding context.
truth_file_paths
list[str]
default:"None"
Local file paths to use as grounding context.
Returns a str with the model’s response.

engine.verify

import engine

result = engine.verify(ai_output, urls=None, truth_file_paths=None, threshold=0.30)
Loads sources, then verifies ai_output against them. Raises ValueError if no sources are provided.
ai_output
str
required
The AI-generated text to verify.
urls
list[str]
default:"None"
URLs to scrape as truth sources.
truth_file_paths
list[str]
default:"None"
Local file paths to use as truth sources.
threshold
float
default:"0.30"
Minimum cosine similarity for claim classification.
Returns a dict with claims and summary.

Usage examples

import engine

result = engine.run(
    prompt="When was BASIC invented and who created it?",
    urls=["https://en.wikipedia.org/wiki/BASIC"],
    threshold=0.30,
)

print(result["ai_output"])
print(result["summary"])
for claim in result["claims"]:
    print(claim["claim_id"], claim["status"], claim["score"])

Build docs developers (and LLMs) love