Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/HarvardPL/AbcDatalog/llms.txt

Use this file to discover all available pages before exploring further.

Every AbcDatalog program is composed of two kinds of clauses: facts, which assert ground truths that are known before evaluation, and rules, which derive new facts from existing ones. Understanding the distinction between the extensional and intensional layers of a program — and the safety conditions that govern them — is the foundation for writing correct Datalog.

EDB and IDB predicates

Datalog draws a fundamental distinction between two classes of predicates:
  • EDB (Extensional Database) predicates are defined entirely by base facts written directly in the program. Their truth value is fixed before evaluation begins and cannot be altered by any rule. In the transitive closure example, edge is an EDB predicate.
  • IDB (Intensional Database) predicates are defined by rules and represent knowledge that is derived during evaluation. tc is an IDB predicate — no fact for tc is stated explicitly; every tc tuple is inferred by applying the rules.
AbcDatalog’s DatalogValidator automatically classifies predicates: any predicate that appears as the head of a rule with a non-empty body is treated as IDB; predicates that appear only in body-less clauses are EDB. A predicate can appear in both positions; in that case the bodiless clauses seed the IDB with initial tuples.

Facts

A fact is a clause with an empty body. It asserts that a specific ground atom is unconditionally true. Facts must be ground — all arguments are constants; no variables are allowed.
% Four EDB facts defining a directed graph
edge(a,b).
edge(b,c).
edge(c,c).
edge(c,d).
Facts are collected by the validator into the set of initial facts (the EDB) and loaded into the engine before any rule is evaluated.

Rules

A rule is a clause with a non-empty body. It states that the head atom holds whenever every premise in the body holds simultaneously. During a bottom-up evaluation, the engine repeatedly applies rules to derive new ground facts until no new facts can be produced (the least fixed point).
% Base case: every direct edge is in the transitive closure
tc(X,Y) :- edge(X,Y).

% Recursive case: follow a chain through the graph
tc(X,Y) :- edge(X,Z), tc(Z,Y).
The head tc(X,Y) is derived for every pair of constants (c1, c2) for which there exists an assignment of variables X, Y, Z that satisfies both body atoms simultaneously.

Safety

A rule is safe if every variable that appears in the head also appears in at least one positive (non-negated) body atom, or is explicitly unified with a constant or a bound variable. AbcDatalog’s DatalogValidator enforces this condition and raises a DatalogValidationException if it is violated. Safety is required because an unsafe rule could derive infinitely many facts — one for each constant in an unbounded domain.
% SAFE: X and Y both appear in the positive body atom edge(X,Y)
tc(X,Y) :- edge(X,Y).

% UNSAFE (rejected): Y does not appear in any positive body atom
bad(X,Y) :- edge(X,_).
The validator tracks a set of bound variables for each rule. A variable X is bound when:
  1. It appears as an argument of a positive (non-negated) body atom, or
  2. It is explicitly unified with a constant via X=a, or
  3. It is explicitly unified with another variable that is itself bound, via X=Y where Y is bound.
Variables appearing only inside negated atoms (not p(X,Y)) or disunification constraints (X!=Y) do not become bound by those premises alone. Every variable in the clause head must be reachable as bound by the above rules before the validator accepts the clause.The full error message from DatalogValidator reads:
“Every variable in a rule must be bound, but X is not bound in the rule … A variable X is bound if 1) it appears in a positive (non-negated) body atom, or 2) it is explicitly unified with a constant (e.g., X=a) or with a variable that is bound (e.g., X=Y, where Y is bound).”

Recursion

Rules in AbcDatalog may be recursive: a predicate’s definition may refer to itself. The transitive closure program uses left-recursive rules, where tc appears in both the head and the body.
tc(X,Y) :- edge(X,Y).
tc(X,Y) :- edge(X,Z), tc(Z,Y).
AbcDatalog supports mutual recursion too: predicate p may depend on q while q depends on p.

Non-linear recursion

The recursive tc rule above is linear: the IDB predicate tc appears exactly once in the body. AbcDatalog also supports non-linear (or semi-positive) recursion, where an IDB predicate appears more than once in a single rule body.
% Non-linear transitive closure: join two partial closures
tc_nonlinear(X,Y) :- edge(X,Y).
tc_nonlinear(X,Y) :- tc_nonlinear(X,Z), tc_nonlinear(Z,Y).
Both formulations produce identical results on the same graph, but non-linear recursion can converge in fewer iterations on some engines.

Wildcard variables

A bare underscore _ is a wildcard — it is parsed as a fresh anonymous variable each time it appears. Wildcards are useful in rule bodies when a term’s value must exist but is not needed elsewhere in the clause.
% Collect all nodes from either endpoint of any edge.
% The _ in edge(X,_) means "some second argument exists, but we don't care what it is".
node(X) :- edge(X,_).
node(X) :- edge(_,X).
Each occurrence of _ in a clause is a distinct variable. Writing edge(_,_) asks for edges where the first and second argument each exist (possibly the same, possibly different), not necessarily that they are equal. If you need two positions to be the same unknown value, use a named variable like Z.

Putting it together: transitive closure

The complete transitive closure program from tc.dtlg demonstrates all of the above concepts in a self-contained example.
% EDB: base graph edges
edge(a,b).
edge(b,c).
edge(c,c).
edge(c,d).

% IDB: linear recursive transitive closure
tc(X,Y) :- edge(X,Y).
tc(X,Y) :- edge(X,Z), tc(Z,Y).

% IDB: alternative non-linear formulation
tc_nonlinear(X,Y) :- edge(X,Y).
tc_nonlinear(X,Y) :- tc_nonlinear(X,Z), tc_nonlinear(Z,Y).

% IDB: node universe derived from edges (uses wildcards)
node(X) :- edge(X,_).
node(X) :- edge(_,X).
After loading this program, the query tc(X,Y)? returns every pair (X,Y) such that Y is reachable from X along one or more edge steps.

Build docs developers (and LLMs) love