Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/HarvardPL/AbcDatalog/llms.txt

Use this file to discover all available pages before exploring further.

Provenance tracking records, for each derived fact, which ground rule instance (a fully substituted clause with no variables) was the last step used to derive it. By recursively calling getJustification() on the body atoms of that rule instance, you can reconstruct a complete derivation tree explaining why a fact holds in the program.

The DatalogEngineWithProvenance interface

DatalogEngineWithProvenance extends DatalogEngine with one additional method:
Clause getJustification(PositiveAtom fact);
  • fact — a ground atom to look up.
  • Returns the ground Clause (rule instance) whose head is fact, or null if the atom is not a derived fact (i.e., it is an EDB base fact or was never derived).
Because DatalogEngineWithProvenance extends DatalogEngine, it supports the same init() and query() lifecycle.

Creating an engine with provenance

SemiNaiveEngine is currently the only engine that supports provenance collection. Use its static factory method:
DatalogEngineWithProvenance engine = SemiNaiveEngine.newEngineWithProvenance();
Passing true to the underlying constructor enables provenance bookkeeping during evaluation. The standard SemiNaiveEngine.newEngine() skips this overhead.

Building a provenance tree

The following example (from SemiNaiveEngine.java) initializes an engine with a transitive closure program and then walks the derivation tree for tc(c, c):
String[] lines = {
  "edge(a, b).",
  "edge(b, c).",
  "edge(c, d).",
  "edge(d, c).",
  "tc(X, Y) :- edge(X, Y).",
  "tc(X, Y) :- tc(X, Z), tc(Z, Y)."
};
StringBuilder sb = new StringBuilder();
for (String line : lines) {
  sb.append(line);
}
Reader r = new StringReader(sb.toString());
DatalogTokenizer t = new DatalogTokenizer(r);
Set<Clause> prog = DatalogParser.parseProgram(t);

DatalogEngineWithProvenance engine = SemiNaiveEngine.newEngineWithProvenance();
engine.init(prog);

// Look up the justification for tc(c, c)
t = new DatalogTokenizer(new StringReader("tc(c, c)?"));
PositiveAtom q = DatalogParser.parseQuery(t);

Clause lastCl = engine.getJustification(q);
System.out.println(lastCl);                       // the rule instance used last

for (Premise p : lastCl.getBody()) {
  if (p instanceof PositiveAtom) {
    Clause secondLastCl = engine.getJustification((PositiveAtom) p);
    System.out.println(secondLastCl);             // the rule instance one step back

    for (Premise p2 : secondLastCl.getBody()) {
      System.out.println(engine.getJustification((PositiveAtom) p2));  // and so on
    }
  }
}
The program being evaluated:
edge(a, b).
edge(b, c).
edge(c, d).
edge(d, c).
tc(X, Y) :- edge(X, Y).
tc(X, Y) :- tc(X, Z), tc(Z, Y).
Each call to getJustification() returns one ground rule instance. Walking the body atoms recursively produces the full derivation tree. When getJustification() returns null for a body atom, that atom is a base EDB fact — the leaf of the tree.
Provenance information is collected during evaluation, not on demand. Enabling it with newEngineWithProvenance() means the engine stores a justification entry for every derived fact, which adds memory overhead proportional to the size of the derived relation.
Provenance is useful for debugging Datalog programs. If a query returns an unexpected fact, call getJustification() on it and walk the tree to find the rule or base fact responsible. This is much faster than tracing evaluation manually through a large program.

Build docs developers (and LLMs) love