Every AbcDatalog program is composed of two kinds of clauses: facts, which assert ground truths that are known before evaluation, and rules, which derive new facts from existing ones. Understanding the distinction between the extensional and intensional layers of a program — and the safety conditions that govern them — is the foundation for writing correct Datalog.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/HarvardPL/AbcDatalog/llms.txt
Use this file to discover all available pages before exploring further.
EDB and IDB predicates
Datalog draws a fundamental distinction between two classes of predicates:- EDB (Extensional Database) predicates are defined entirely by base facts written directly in the program. Their truth value is fixed before evaluation begins and cannot be altered by any rule. In the transitive closure example,
edgeis an EDB predicate. - IDB (Intensional Database) predicates are defined by rules and represent knowledge that is derived during evaluation.
tcis an IDB predicate — no fact fortcis stated explicitly; everytctuple is inferred by applying the rules.
AbcDatalog’s
DatalogValidator automatically classifies predicates: any predicate that appears as the head of a rule with a non-empty body is treated as IDB; predicates that appear only in body-less clauses are EDB. A predicate can appear in both positions; in that case the bodiless clauses seed the IDB with initial tuples.Facts
A fact is a clause with an empty body. It asserts that a specific ground atom is unconditionally true. Facts must be ground — all arguments are constants; no variables are allowed.Rules
A rule is a clause with a non-empty body. It states that the head atom holds whenever every premise in the body holds simultaneously. During a bottom-up evaluation, the engine repeatedly applies rules to derive new ground facts until no new facts can be produced (the least fixed point).tc(X,Y) is derived for every pair of constants (c1, c2) for which there exists an assignment of variables X, Y, Z that satisfies both body atoms simultaneously.
Safety
A rule is safe if every variable that appears in the head also appears in at least one positive (non-negated) body atom, or is explicitly unified with a constant or a bound variable. AbcDatalog’sDatalogValidator enforces this condition and raises a DatalogValidationException if it is violated.
Safety is required because an unsafe rule could derive infinitely many facts — one for each constant in an unbounded domain.
What makes a program safe?
What makes a program safe?
The validator tracks a set of bound variables for each rule. A variable
X is bound when:- It appears as an argument of a positive (non-negated) body atom, or
- It is explicitly unified with a constant via
X=a, or - It is explicitly unified with another variable that is itself bound, via
X=YwhereYis bound.
not p(X,Y)) or disunification constraints (X!=Y) do not become bound by those premises alone. Every variable in the clause head must be reachable as bound by the above rules before the validator accepts the clause.The full error message from DatalogValidator reads:“Every variable in a rule must be bound, but X is not bound in the rule … A variable X is bound if 1) it appears in a positive (non-negated) body atom, or 2) it is explicitly unified with a constant (e.g., X=a) or with a variable that is bound (e.g., X=Y, where Y is bound).”
Recursion
Rules in AbcDatalog may be recursive: a predicate’s definition may refer to itself. The transitive closure program uses left-recursive rules, wheretc appears in both the head and the body.
p may depend on q while q depends on p.
Non-linear recursion
The recursivetc rule above is linear: the IDB predicate tc appears exactly once in the body. AbcDatalog also supports non-linear (or semi-positive) recursion, where an IDB predicate appears more than once in a single rule body.
Wildcard variables
A bare underscore_ is a wildcard — it is parsed as a fresh anonymous variable each time it appears. Wildcards are useful in rule bodies when a term’s value must exist but is not needed elsewhere in the clause.
Each occurrence of
_ in a clause is a distinct variable. Writing edge(_,_) asks for edges where the first and second argument each exist (possibly the same, possibly different), not necessarily that they are equal. If you need two positions to be the same unknown value, use a named variable like Z.Putting it together: transitive closure
The complete transitive closure program fromtc.dtlg demonstrates all of the above concepts in a self-contained example.
tc(X,Y)? returns every pair (X,Y) such that Y is reachable from X along one or more edge steps.