Why-provenance: tracing how facts were derived

Provenance tracking records, for each derived fact, which ground rule instance (a fully substituted clause with no variables) was the last step used to derive it. By recursively calling getJustification() on the body atoms of that rule instance, you can reconstruct a complete derivation tree explaining why a fact holds in the program.

The DatalogEngineWithProvenance interface

DatalogEngineWithProvenance extends DatalogEngine with one additional method:

Clause getJustification(PositiveAtom fact);

fact — a ground atom to look up.
Returns the ground Clause (rule instance) whose head is fact, or null if the atom is not a derived fact (i.e., it is an EDB base fact or was never derived).

Because DatalogEngineWithProvenance extends DatalogEngine, it supports the same init() and query() lifecycle.

Creating an engine with provenance

SemiNaiveEngine is currently the only engine that supports provenance collection. Use its static factory method:

DatalogEngineWithProvenance engine = SemiNaiveEngine.newEngineWithProvenance();

Passing true to the underlying constructor enables provenance bookkeeping during evaluation. The standard SemiNaiveEngine.newEngine() skips this overhead.

Building a provenance tree

The following example (from SemiNaiveEngine.java) initializes an engine with a transitive closure program and then walks the derivation tree for tc(c, c):

String[] lines = {
  "edge(a, b).",
  "edge(b, c).",
  "edge(c, d).",
  "edge(d, c).",
  "tc(X, Y) :- edge(X, Y).",
  "tc(X, Y) :- tc(X, Z), tc(Z, Y)."
};
StringBuilder sb = new StringBuilder();
for (String line : lines) {
  sb.append(line);
}
Reader r = new StringReader(sb.toString());
DatalogTokenizer t = new DatalogTokenizer(r);
Set<Clause> prog = DatalogParser.parseProgram(t);

DatalogEngineWithProvenance engine = SemiNaiveEngine.newEngineWithProvenance();
engine.init(prog);

// Look up the justification for tc(c, c)
t = new DatalogTokenizer(new StringReader("tc(c, c)?"));
PositiveAtom q = DatalogParser.parseQuery(t);

Clause lastCl = engine.getJustification(q);
System.out.println(lastCl);                       // the rule instance used last

for (Premise p : lastCl.getBody()) {
  if (p instanceof PositiveAtom) {
    Clause secondLastCl = engine.getJustification((PositiveAtom) p);
    System.out.println(secondLastCl);             // the rule instance one step back

    for (Premise p2 : secondLastCl.getBody()) {
      System.out.println(engine.getJustification((PositiveAtom) p2));  // and so on
    }
  }
}

The program being evaluated:

edge(a, b).
edge(b, c).
edge(c, d).
edge(d, c).
tc(X, Y) :- edge(X, Y).
tc(X, Y) :- tc(X, Z), tc(Z, Y).

Each call to getJustification() returns one ground rule instance. Walking the body atoms recursively produces the full derivation tree. When getJustification() returns null for a body atom, that atom is a base EDB fact — the leaf of the tree.

Provenance information is collected during evaluation, not on demand. Enabling it with newEngineWithProvenance() means the engine stores a justification entry for every derived fact, which adds memory overhead proportional to the size of the derived relation.

Provenance is useful for debugging Datalog programs. If a query returns an unexpected fact, call getJustification() on it and walk the tree to find the rule or base fact responsible. This is much faster than tracing evaluation manually through a large program.

Get Started

Language Guide

Evaluation Engines

Executor API

Extending AbcDatalog

Why-provenance: tracing how facts were derived

The DatalogEngineWithProvenance interface

Creating an engine with provenance

Building a provenance tree

Build docs developers (and LLMs) love

Get Started

Language Guide

Evaluation Engines

Executor API

Extending AbcDatalog

Documentation Index

​The DatalogEngineWithProvenance interface

​Creating an engine with provenance

​Building a provenance tree

Build docs developers (and LLMs) love

The DatalogEngineWithProvenance interface

Creating an engine with provenance

Building a provenance tree