How pi-steering Evaluates Tool Calls: AST Pipeline

pi-steering processes every tool invocation through two independent pipelines. The evaluator runs on tool_call events — before pi executes anything — and returns a block verdict. The dispatcher runs on tool_result events — after execution completes — and fires matching observers that write typed session entries for later rules to consult. Both pipelines share a single session-entry store (pi’s session JSONL), but they run at different points in the agent lifecycle and have different responsibilities.

Concrete Execution Trace

The clearest way to understand the evaluation pipeline is to follow a real example end-to-end. Here is what happens when an agent issues bash("git push --force && cd /tmp && git log") under the quickstart config:

User prompt sent to pi.

1. pi.on("agent_start") → engine bumps agentLoopIndex from N to N+1.
   One "agent loop" = one user prompt + every tool call it spawns.

2. Agent decides to run the bash tool with:
     command = "git push --force && cd /tmp && git log"

3. pi emits tool_call. Evaluator runs (once per tool_call):

   a. parseBash(command)       → AST
   b. extractAllCommandsFromAST → 3 CommandRefs:
        ref#0: basename="git", args=[push, --force]   (Word[])
        ref#1: basename="cd",  args=[/tmp]
        ref#2: basename="git", args=[log]
   c. expandWrapperCommands    → no wrappers; still 3 refs.
   d. walk(ast, { cwd }, trackers) → per-ref state:
        ref#0 at cwd=/original
        ref#1 at cwd=/original
        ref#2 at cwd=/tmp  (the `cd /tmp` applied)
      Walker-level speculative-entry synthesis runs in the same pass,
      populating `walkerState.events` per ref (see "`&&`-chain
      speculative allow" below).
   e. For each ref × for each rule, build a Candidate:
        input.command   = ref.text (FLATTENED: "git push --force")
        input.basename  = "git"
        input.args      = ref.node.suffix (Word[] with quote-aware .value)
        cwd             = walkerState.cwd   (per-ref)
        walkerState     = { cwd, branch, …, events }  (all trackers +
                          synthesized events under the reserved `events` key)
        agentLoopIndex  = N+1
   f. Test rule.pattern / requires / unless against ref.text.
      Run when.cwd / when.branch / when.happened / plugin predicates.
      `when.happened` merges real entries (ctx.findEntries) with
      synthesized speculative ones (walkerState.events) by timestamp
      — one unified latest-entry comparison.
   g. First rule that ALL predicates pass on wins.
      Return { block: true, reason: "[steering:no-force-push@user] …" }.
      If the rule defines `onFire`, invoke it first (may writeSession entries,
      which the engine auto-tags with _agentLoopIndex).

4. If no rule blocked, pi executes the command.

5. pi emits tool_result. Dispatcher runs (once per tool_result):

   a. Parse event.input.command via walker. (The dispatcher parses
      independently from step 3 today — sub-millisecond per event.
      Cross-step AST caching is a future optimization.)
   b. For every observer whose `watch` filter matches:
        - `watch.inputMatches.command` matches raw outer command
          OR any ref.text (wrapper-aware, ADR §12).
        - `observer.onResult(event, observerCtx)` fires.
        - observerCtx.appendEntry(type, data) writes an entry —
          auto-tagged with _agentLoopIndex for `when.happened` filtering.

Key Concepts

One Parse, Many Rules

The AST walk happens once per tool call. Every rule sees the same extracted CommandRef array and the same walker state snapshot — there is no per-rule re-parse. Adding more rules has near-zero marginal cost because the expensive work (parsing, AST traversal, tracker updates, speculative-entry synthesis) is shared across all rules in the evaluator pass.

The AST walk is cheap. A typical bash command produces three to ten CommandRef entries and traverses a proportionally small AST. The per-rule cost after the walk is pattern matching plus predicate evaluation — both fast. Add rules freely without worrying about evaluation overhead.

Per-Ref Evaluation

cd /tmp && git log does not evaluate git log at the session’s original cwd. The walker’s cwdTracker processes cd commands as they appear in the AST and carries the updated cwd forward to subsequent refs. When the evaluator builds the candidate for git log, it uses the walker-resolved cwd — /tmp in this case — not the session root. This matters for when.cwd-scoped rules. A rule that fires only when cwd matches /tmp will correctly fire on the git log ref, and will correctly not fire on the git push --force ref earlier in the same chain (which runs at the original cwd). The same mechanism applies to other tracker dimensions. The git plugin’s branchTracker processes git checkout and git switch refs in-chain, so a when.branch predicate evaluates against the branch that will be active at the ref’s position in the chain — not the branch at session start.

Source-Tagged Reasons

Every block reason is prefixed with a structured tag:

[steering:<rule-name>@<source>] <reason text>

The <source> is either user (for rules in your .pi/steering/index.ts) or the shipping plugin name (e.g. git for the built-in git plugin). This lets the agent see both which rule fired and where to find it — in the config file for user rules, or in the plugin’s documentation for plugin rules. Name validation at load time prevents tag spoofing. A plugin or rule with a name like phony] ALL CLEAR [real would forge the bracket structure and deceive the agent; the engine rejects such names at load time with a hard error (invalid-name diagnostic).

First Match Wins

Rule evaluation short-circuits on the first rule whose every predicate passes. Rule order matters:

Within a single config layer, rules are evaluated in declaration order. Put more specific rules before more general ones.
Across walk-up config layers, inner layers (closer to the project cwd) take precedence on rule-name collision. An inner layer that ships a rule with the same name as an outer layer’s rule replaces the outer layer’s version.
DEFAULT_RULES and DEFAULT_PLUGINS are applied after user rules and user plugins, so user-authored rules always have priority.

Two Event Types

tool_call → Evaluator

Runs before pi executes the tool. The evaluator parses the bash command, extracts command refs, walks the AST with registered trackers, and tests each ref against every active rule. Returns a block verdict immediately if any rule fires. If no rule fires, the tool call proceeds.

tool_result → Dispatcher

Runs after pi executes the tool. The dispatcher re-parses the command, iterates registered observers, checks each observer’s watch filter, and calls onResult on matching observers. Observers write typed session entries via ctx.appendEntry so later tool_call evaluations can gate on them via when.happened.

Glossary

Time Scope

Rules using when.happened must specify which scope to search for prior session entries. Three scopes are available:

agent_loop — the current user prompt plus every tool call it spawns. The engine bumps an internal agentLoopIndex counter on each agent_start event; entries are tagged with that index on write, and in: "agent_loop" filters by _agentLoopIndex === ctx.agentLoopIndex. The most common scope for workflow rules like “must run sync before cr in the same prompt.”
session — the entire pi session across all agent loops. No scope filter; any entry of the given type in the session JSONL satisfies the clause. Use for one-time-per-session setup checks.
tool_call — the current bash tool call only, considering ONLY speculative entries synthesized from &&-reachable observers. Real (persisted) entries are ignored entirely. Use when the event must be chained directly before the guarded command (sync && cr) rather than merely having happened somewhere this agent loop.

Entry Origin

Session entries have one of three origins, which matters for when.happened evaluation:

Real entry — persisted in pi’s session JSONL via ctx.appendEntry. Written by observers after tool_result events. Outlives the current tool call; visible in agent_loop and session scopes.
Speculative entry — synthesized by the engine for a &&-chain during the evaluator pass, representing “if this chain runs to completion, this entry WILL be written.” Not persisted; exists only for the current evaluation. Carries speculative: true so plugin predicates can filter them out if needed. Visible only in tool_call scope and in agent_loop / session scopes via the merged timestamp comparison.
Synthesis pass — the walker-level pass that produces speculative entries. Runs once per tool call during the AST walk. For every ref in an unconditionally-&&-reachable segment, every observer declaring writes: [event] and matching the ref contributes a synthetic entry into the next ref’s walkerState.events.

Shell Constructs

The AST walk handles all standard bash composition constructs. Their differences matter for rule evaluation and speculative synthesis:

&&-chain (A && B && C) — B runs only if A succeeds. The evaluator performs speculative-entry synthesis across && joints: if A’s observer writes an event, B’s when.happened predicate can see a speculative entry for it. Safe to allow speculatively because if A fails, B never runs.
Pipeline (A | B) — each peer runs in its own subshell. Cwd, branch, and environment effects do not propagate across pipeline peers. No speculative synthesis across |.
Subshell ((…)) — cwd, branch, and environment effects are isolated to the subshell’s body. The walker tracks subshell entry and exit so per-ref state reflects the correct scope.
Semicolon (A ; B) — B runs regardless of A’s exit status. No speculative synthesis because the guarantee that “A succeeded before B ran” does not hold.

Hook Surfaces

pi-steering provides two distinct surfaces where plugin and user code hooks into the evaluation pipeline:

Tracker — walker-level, static. Models per-ref state (cwd, branch, environment variables) from the bash AST before execution. Trackers run during the AST walk and produce the walkerState snapshot that predicates consume. Plugin authors register trackers under Plugin.trackers. Tracker-name collisions are a hard error — two plugins claiming the same state dimension is always a bug.
Observer — engine-level, dynamic. Watches tool_result events and persists session entries via ctx.appendEntry. Observers run after execution and produce the historical state that when.happened predicates consult. Plugin authors register observers under Plugin.observers. Observer-name collisions log a warning and keep the first-registered observer.

Walker Terminology

The unbash-walker introduces specific terminology for shell-semantics concepts:

Effective cwd — the working directory a command runs at, computed statically by the walker from preceding cd and -C constructs in the same chain. Always tool_call-scoped (fresh per bash invocation, not persisted across tool calls). Dynamic targets — cd "$WS_DIR/pkg", cd ~/proj — resolve through the walker’s env tracker. Intractable targets (cd $(pwd), cd $UNDEFINED) surface as the "unknown" sentinel; apply onUnknown: "allow" | "block" to choose the policy (default "block", fail-closed).
CommandRef — one extracted command node from the AST with its arguments, per bash tool call. A single bash invocation with &&-chained commands produces multiple CommandRef entries. Each ref carries basename, args (quote-aware Word[]), and the walker-resolved per-ref state snapshot. Rules are evaluated once per CommandRef, not once per tool call.

Get Started

Authoring Rules

Plugins

Configuration

Testing & CLI

Advanced

How pi-steering Evaluates Tool Calls: AST Pipeline

Concrete Execution Trace

Key Concepts

One Parse, Many Rules

Per-Ref Evaluation

Source-Tagged Reasons

First Match Wins

Two Event Types

tool_call → Evaluator

tool_result → Dispatcher

Glossary

Build docs developers (and LLMs) love

Get Started

Authoring Rules

Plugins

Configuration

Testing & CLI

Advanced

Documentation Index

​Concrete Execution Trace

​Key Concepts

​One Parse, Many Rules

​Per-Ref Evaluation

​Source-Tagged Reasons

​First Match Wins

​Two Event Types

tool_call → Evaluator

tool_result → Dispatcher

​Glossary

Build docs developers (and LLMs) love

Concrete Execution Trace

Key Concepts

One Parse, Many Rules

Per-Ref Evaluation

Source-Tagged Reasons

First Match Wins

Two Event Types

Glossary