Engine class: pipeline, generation, and verification API

The Engine class in engine.py provides a high-level pipeline that combines web scraping, AI text generation via OpenAI, and claim verification. It also exposes three module-level convenience functions that delegate to a shared _engine instance so you can use the module directly without instantiating Engine yourself.

Engine and the module-level functions require a valid OPENAI_API_KEY environment variable. Set it before calling generate or run.

export OPENAI_API_KEY="sk-..."

To change the default OpenAI model, set the OPENAI_MODEL environment variable before importing the module. If the variable is not set, the model defaults to "gpt-4o".

export OPENAI_MODEL="gpt-4o-mini"

Constructor

Engine(model=DEFAULT_MODEL, sentences_per_chunk=2, sentence_overlap=1, timeout=10)

model

str

OpenAI model name to use for generation. Defaults to the value of the OPENAI_MODEL environment variable, falling back to "gpt-4o" if the variable is not set.

sentences_per_chunk

int

default:"2"

Number of sentences per chunk, passed directly to the underlying Halgorithm instance.

sentence_overlap

int

default:"1"

Number of sentences to overlap between consecutive chunks, passed to Halgorithm.

timeout

int

default:"10"

Stored on the instance but not currently forwarded to WebScraper. Web scraping uses a hardcoded 5-second timeout per request.

Methods

run

run(prompt, urls=None, truth_file_paths=None, threshold=0.30)

Executes the full pipeline in a single call: loads sources (scraping URLs and/or reading local files), generates an AI response grounded in those sources, then verifies each claim in the response against the same sources. Raises ValueError if neither urls nor truth_file_paths produce any source documents.

prompt

str

required

The question or instruction to send to the OpenAI model.

urls

list[str]

default:"None"

List of URLs to scrape as truth sources. Pages are scraped to plain text before being passed to the model and verifier.

truth_file_paths

list[str]

default:"None"

List of local file paths to load as truth sources. Can be combined with urls.

threshold

float

default:"0.30"

Minimum cosine similarity score passed through to Halgorithm.compare_to_docs.

Returns a dict:

claims

list[dict]

List of claim result dicts — one per extracted claim. See claim result object reference.

summary

str

Human-readable summary string, e.g. "3/4 supported, 1/4 weak, 0/4 contradictions, 0/4 hallucinations".

ai_output

str

The raw text generated by the OpenAI model.

sources

list[str]

List of file paths and/or URLs that were used as truth sources.

# Example return value
{
    "claims": [...],
    "summary": "3/4 supported, 1/4 weak, 0 contradictions, 0 hallucinations",
    "ai_output": "BASIC was invented at Dartmouth College in 1964...",
    "sources": ["https://en.wikipedia.org/wiki/BASIC", "sources/basic.txt"]
}

generate

generate(prompt, source_docs=None)

Sends prompt to the OpenAI model, optionally grounded by source_docs. When source documents are provided, the model is instructed not to add facts beyond what is present in those documents.

prompt

str

required

The question or instruction to send to the model.

source_docs

list[dict]

default:"None"

List of source document dicts with file_path and text keys — the format returned by scrape_urls() or load_truth_files(). When omitted, the model answers from its own training data.

Returns a str containing the model’s response text.

verify

verify(ai_output, source_docs, threshold=0.30)

Runs claim verification against pre-loaded source documents without calling OpenAI. Use this when you already have an AI-generated text and want to check it independently of generation.

ai_output

str

required

The AI-generated text to verify.

source_docs

list[dict]

required

List of source document dicts (with file_id, file_path, text keys) to verify against.

threshold

float

default:"0.30"

Minimum cosine similarity score to avoid a HALLUCINATION classification.

Returns a dict with claims (list of claim result dicts) and summary (str).

scrape_urls

scrape_urls(urls)

Scrapes a list of URLs to plain text using WebScraper and returns them as source document dicts. Pages are written to a temporary directory during scraping and cleaned up automatically.

urls

Iterable[str]

required

URLs to scrape. A warning is printed for any URL that fails to scrape; failed URLs are omitted from the result.

Returns a list of dicts, each with:

file_id

int

1-indexed position of the URL in the input list.

file_path

str

The original URL string.

text

str

Plain text content scraped from the page.

load_truth_files

load_truth_files(file_paths)

Loads local files from disk by delegating to Halgorithm.load_files(). Returns the same format as scrape_urls so that both can be combined as truth sources.

file_paths

Iterable[str]

required

File paths to load.

Returns a list of dicts with file_id, file_path, and text keys.

Module-level functions

The module exposes three top-level convenience functions that use a shared Engine() instance created at import time. You do not need to instantiate Engine to use them.

engine.run

import engine

result = engine.run(prompt, urls=None, truth_file_paths=None, threshold=0.30)

Delegates to Engine.run() on the shared instance. Raises ValueError if no sources are provided.

prompt

str

required

The question or instruction for the model.

urls

list[str]

default:"None"

URLs to scrape as truth sources.

truth_file_paths

list[str]

default:"None"

Local file paths to load as truth sources.

threshold

float

default:"0.30"

Minimum cosine similarity for claim classification.

Returns the same dict as Engine.run().

engine.generate

import engine

text = engine.generate(prompt, urls=None, truth_file_paths=None)

Loads sources from urls and/or truth_file_paths, then generates a grounded response. If no sources are provided, the model answers from its own training data.

prompt

str

required

The question or instruction for the model.

urls

list[str]

default:"None"

URLs to scrape as grounding context.

truth_file_paths

list[str]

default:"None"

Local file paths to use as grounding context.

Returns a str with the model’s response.

engine.verify

import engine

result = engine.verify(ai_output, urls=None, truth_file_paths=None, threshold=0.30)

Loads sources, then verifies ai_output against them. Raises ValueError if no sources are provided.

ai_output

str

required

The AI-generated text to verify.

urls

list[str]

default:"None"

URLs to scrape as truth sources.

truth_file_paths

list[str]

default:"None"

Local file paths to use as truth sources.

threshold

float

default:"0.30"

Minimum cosine similarity for claim classification.

Returns a dict with claims and summary.

Usage examples

import engine

result = engine.run(
    prompt="When was BASIC invented and who created it?",
    urls=["https://en.wikipedia.org/wiki/BASIC"],
    threshold=0.30,
)

print(result["ai_output"])
print(result["summary"])
for claim in result["claims"]:
    print(claim["claim_id"], claim["status"], claim["score"])

Core API

Modules

Engine class: pipeline, generation, and verification API

Constructor

Methods

run

generate

verify

scrape_urls

load_truth_files

Module-level functions

engine.run

engine.generate

engine.verify

Usage examples

Build docs developers (and LLMs) love

Core API

Modules

Documentation Index

​Constructor

​Methods

​run

​generate

​verify

​scrape_urls

​load_truth_files

​Module-level functions

​engine.run

​engine.generate

​engine.verify

​Usage examples

Build docs developers (and LLMs) love

Constructor

Methods

run

generate

verify

scrape_urls

load_truth_files

Module-level functions

engine.run

engine.generate

engine.verify

Usage examples