Testing pi-steering Rules Without Running pi

pi-steering exports a pi-steering/testing subpath with primitives that exercise the full evaluation pipeline without booting pi. loadHarness builds the same evaluator and observer dispatcher pair that production uses; expectBlocks and expectAllows then drive events through it and assert verdicts in a single call. All tests are deterministic and CI-safe — no pi runtime stub, no file-system walk-up, no live exec unless you explicitly wire one.

Import path

import {
  loadHarness,
  expectBlocks,
  expectAllows,
  testPredicate,
  testObserver,
  runMatrix,
  formatMatrix,
} from "pi-steering/testing";

Harness-level testing

loadHarness accepts a static SteeringConfig and returns a Harness — a build-once, invoke-many handle whose evaluate and dispatch methods are identical in signature to the production runtime’s. Tests that call harness.evaluate directly are invoking exactly the same code path pi does on every tool_call event.

const harness = loadHarness({
  config: { plugins: [myPlugin], rules: [...] },
});

`expectBlocks`

Assert that the harness returns a block verdict for the given event. Throws on allow.

await expectBlocks(
  harness,
  { command: "git push --force" },
  { rule: "no-force-push" },
);

The second argument accepts bash, write, and edit shorthand forms:

{ command } — bash tool call
{ write: { path, content } } — write tool call
{ edit: { path, edits } } — edit tool call

The optional third argument narrows the assertion. rule checks that the named rule fired (matched against the [steering:<rule>@<source>] prefix in the reason string; the source suffix is ignored). reason accepts an exact string or a RegExp for a looser match. Omit both to assert only that something blocked.

`expectAllows`

Assert that the harness allows the given event — no rule fires.

await expectAllows(harness, { command: "git push" });

Throws with a rich message — including which rule fired and its full reason — if the event is unexpectedly blocked.

Full example from the `work-item-plugin`

The canonical plugin example wires a minimal plugin (just the predicate the rule needs) to keep each rule test focused:

import { describe, it } from "node:test";
import { expectAllows, expectBlocks, loadHarness } from "pi-steering/testing";
import type { Plugin } from "pi-steering";
import { workItemFormat } from "../predicates/work-item-format.ts";
import { commitRequiresWorkItem } from "./commit-requires-work-item.ts";

const testPlugin: Plugin = {
  name: "test",
  predicates: { workItemFormat },
};

describe("commit-requires-work-item", () => {
  const harness = loadHarness({
    config: {
      plugins: [testPlugin],
      rules: [commitRequiresWorkItem],
    },
  });

  it("blocks a commit missing the work-item tag", async () => {
    await expectBlocks(
      harness,
      { command: 'git commit -m "feat: add thing"' },
      { rule: "commit-requires-work-item" },
    );
  });

  it("allows a commit containing [PROJ-N]", async () => {
    await expectAllows(harness, {
      command: 'git commit -m "feat: add thing [PROJ-42]"',
    });
  });

  it("allows commits using --message long form", async () => {
    await expectAllows(harness, {
      command: 'git commit --message "fix [PROJ-1] bad thing"',
    });
  });

  it("does NOT fire on git log --grep=\"commit\"", async () => {
    await expectAllows(harness, {
      command: 'git log --grep="commit"',
    });
  });
});

Unit-level testing

For predicates and observers that you want to exercise in complete isolation — without standing up a full harness — the testing subpath provides testPredicate and testObserver.

`testPredicate`

Build a PredicateContext from MockContextOptions and call the handler directly. Returns the boolean verdict.

const fires = await testPredicate(branch, /^main$/, {
  walkerState: { branch: "main" },
});

Supply only the MockContextOptions fields your predicate reads. The defaults fill in cwd: "/tmp/test", an empty env map, and an exec stub that rejects loudly if accidentally called — so a test that forgets to wire exec fails explicitly rather than silently evaluating against an empty result.

`testObserver`

Drive an observer’s onResult handler at a synthetic tool_result event and inspect what it appended.

const { entries, watchMatched } = await testObserver(
  myObserver,
  { toolName: "bash", input: { command: "npm test" }, output: {}, exitCode: 0 },
);

watchMatched tells you whether the observer’s watch filter accepted the event — onResult is only called when it does, mirroring production dispatch. entries is the list of appendEntry writes captured from the handler. The npm-test-tracker test suite from the work-item-plugin example demonstrates the full contract:

import { describe, it } from "node:test";
import assert from "node:assert/strict";
import { testObserver } from "pi-steering/testing";
import { TEST_PASSED_EVENT, npmTestTracker } from "./npm-test-tracker.ts";

describe("npm-test-tracker observer", () => {
  it("records a TEST_PASSED_EVENT entry on successful `npm test`", async () => {
    const { entries, watchMatched } = await testObserver(
      npmTestTracker,
      {
        toolName: "bash",
        input: { command: "npm test" },
        output: {},
        exitCode: 0,
      },
    );

    assert.equal(watchMatched, true);
    assert.equal(entries.length, 1);
    assert.equal(entries[0]?.customType, TEST_PASSED_EVENT);
    assert.deepEqual(entries[0]?.data, {
      command: "npm test",
      // The dispatcher auto-tags plain-object payloads with
      // `_agentLoopIndex` — default mockObserverContext has index 0.
      _agentLoopIndex: 0,
    });
  });

  it("does NOT fire on a failed `npm test` (exitCode non-zero)", async () => {
    const { entries, watchMatched } = await testObserver(
      npmTestTracker,
      { toolName: "bash", input: { command: "npm test" }, output: {}, exitCode: 1 },
    );
    assert.equal(watchMatched, false);
    assert.equal(entries.length, 0);
  });

  it("does NOT fire on an unrelated command", async () => {
    const { entries, watchMatched } = await testObserver(
      npmTestTracker,
      { toolName: "bash", input: { command: "npm install" }, output: {}, exitCode: 0 },
    );
    assert.equal(watchMatched, false);
    assert.equal(entries.length, 0);
  });

  it("tags writes with the agent-loop index from ctx", async () => {
    const { entries } = await testObserver(
      npmTestTracker,
      { toolName: "bash", input: { command: "npm test" }, output: {}, exitCode: 0 },
      { agentLoopIndex: 42 },
    );
    const data = entries[0]?.data as { _agentLoopIndex: number };
    assert.equal(data._agentLoopIndex, 42);
  });
});

`MockContextOptions` knobs

testPredicate (and mockContext directly) accept these options:

cwd

string

default:"/tmp/test"

The effective cwd the predicate sees via ctx.cwd and ctx.walkerState.cwd.

agentLoopIndex

number

default:"0"

The engine’s agent-loop counter. Rules using when.happened: { in: "agent_loop" } scope filtering compare this against _agentLoopIndex on session entries — set it to match the entries you supply via priorEntry.

tool

"bash" | "write" | "edit"

default:"\"bash\""

Which tool the predicate is evaluating under. Drives the default shape of input when no input is supplied.

input

PredicateToolInput

Full tool input. Derived from tool when omitted: bash gives { command: "" }, write gives { path: "", content: "" }, edit gives { path: "", edits: [] }.

walkerState

Partial<WhenWalkerState>

Walker-state snapshot the predicate sees. Merged over { cwd, env: new Map() } — supply only the fields your predicate reads. Pass { branch: "main" } to test a branch predicate without wiring the full git tracker.

exec

(cmd, args, opts?) => ExecResult | Promise<ExecResult>

Stub for ctx.exec. Defaults to rejecting with a clear error — tests that call out to exec must stub explicitly.

entries

ReadonlyArray<MockEntry>

Prior session entries findEntries reads from. Build entries with priorEntry(customType, data, { agentLoopIndex }) so the _agentLoopIndex tag is stamped exactly as the live engine would stamp it — a hand-rolled literal with a typo on the key name silently fails to match when.happened scope filtering.

toolCallEvents

Record<string, readonly SyntheticEntry[]>

Per-ref speculative events for when.happened: { in: "tool_call" } simulation. Keys are the customType event literals; values are synthetic entries — the same shape the walker-level speculative-synthesis pass produces. Use this to test &&-chain allow logic in isolation without a full harness.

Adversarial matrices

An adversarial matrix pins blocking behavior across every shell variant that should trigger a rule — and confirms that false-friend inputs that look like the pattern but should not match are correctly allowed. runMatrix never throws; failures surface in result.cases so all cases are evaluated even when early ones fail.

import { runMatrix, formatMatrix } from "pi-steering/testing";

const result = await runMatrix(harness, [
  { name: "raw",          event: { command: "git push --force" },         expect: "block" },
  { name: "subshell",     event: { command: "sh -c 'git push --force'" }, expect: "block" },
  { name: "sudo",         event: { command: "sudo git push --force" },    expect: "block" },
  { name: "quoted-arg",   event: { command: "git push '--force'" },       expect: "block" },
  { name: "false-friend", event: { command: "echo 'git push --force'" },  expect: "allow" },
]);
console.log(formatMatrix(result));

formatMatrix renders an ASCII-friendly report suited to CI log aggregators:

MATRIX — 5 cases. 5 pass, 0 fail.
================================================================
[raw]          expect:block  actual:BLOCK (no-force-push)
[subshell]     expect:block  actual:BLOCK (no-force-push)
[sudo]         expect:block  actual:BLOCK (no-force-push)
[quoted-arg]   expect:block  actual:BLOCK (no-force-push)
[false-friend] expect:allow  actual:allow
================================================================
PASS: 5/5

Each MatrixCase accepts an expect of "block", "allow", or { block: true, rule: "rule-name" } to also pin which rule fired. The optional cwd field on each case overrides the fallback "/tmp/test" for cwd-scoped rules.

Use matrices to cover the adversarial cases the README highlights — wrappers, subshells, quoted args, and false-friend strings that look like the pattern but shouldn’t match. AST-backed evaluation means echo 'git push --force' does not match a ^git\s+push rule because the pattern tests actual command refs, not substrings of arguments. A matrix that includes the false-friend case documents that guarantee and would catch a regression to substring matching.

`LoadHarnessOptions`

config

SteeringConfig

required

The SteeringConfig to test with. Passed directly into the merger pipeline — same resolvePlugins + buildEvaluator + buildObserverDispatcher path production uses.

includeDefaults

boolean

default:"false"

Prepend DEFAULT_RULES and DEFAULT_PLUGINS to the config at the innermost position. Mirrors the production !config.disableDefaults flag, but kept explicit here so tests can exercise default rules without editing the config under test.Note: unlike production, loadHarness does not throw on error-class diagnostics. It surfaces them in harness.diagnostics so plugin-author tests can assert on them directly.

host

EvaluatorHost

Custom host to drive exec and appendEntry. Defaults to an in-memory stub whose exec rejects with a clear error and whose appendEntry is a silent sink. Pass a createRecordingHost() instance when a test needs to inspect what the engine wrote across multiple calls — for example, to verify a self-marking onFire rule wrote the correct session entry before the block verdict returned.

Available exports from `pi-steering/testing`

All primitives are available from the pi-steering/testing subpath. They are also re-exported from the package root for discoverability.

Export	Purpose
`loadHarness`	Build an evaluator + dispatcher pair from a static `SteeringConfig`
`expectBlocks`	Assert an event is blocked; optionally pin which rule and reason
`expectAllows`	Assert an event is allowed
`expectRuleFires`	Thin alias over `expectBlocks` for tests focused on which rule fired
`runMatrix`	Batch-evaluate a list of cases; never throws — failures surface in `result.cases`
`formatMatrix`	Render a `MatrixResult` as an ASCII-friendly CI report
`testPredicate`	Drive a single `PredicateHandler` against a `mockContext`
`testObserver`	Drive an `Observer.onResult` at a synthetic event; captures `appendEntry` writes
`mockContext`	Build a `PredicateContext` for unit-testing predicates in isolation
`mockExtensionContext`	Build a minimal `ExtensionContext` stub backed by a `RecordedSessionEntry` array
`mockObserverContext`	Build an `ObserverContext` for unit-testing observer handlers
`priorEntry`	Build a `MockEntry` with the `_agentLoopIndex` tag stamped correctly
`createRecordingHost`	Build a recording `EvaluatorHost` that captures every `exec` and `appendEntry` call
`getAppendedEntries`	Read the `appendEntry` capture buffer for a mock context

Get Started

Authoring Rules

Plugins

Configuration

Testing & CLI

Advanced

Testing pi-steering Rules Without Running pi

Import path

Harness-level testing

`expectBlocks`

`expectAllows`

Full example from the `work-item-plugin`

Unit-level testing

`testPredicate`

`testObserver`

`MockContextOptions` knobs

Adversarial matrices

`LoadHarnessOptions`

Available exports from `pi-steering/testing`

Build docs developers (and LLMs) love

Get Started

Authoring Rules

Plugins

Configuration

Testing & CLI

Advanced

Documentation Index

​Import path

​Harness-level testing

​expectBlocks

​expectAllows

​Full example from the work-item-plugin

​Unit-level testing

​testPredicate

​testObserver

​MockContextOptions knobs

​Adversarial matrices

​LoadHarnessOptions

​Available exports from pi-steering/testing

Build docs developers (and LLMs) love

Import path

Harness-level testing

`expectBlocks`

`expectAllows`

Full example from the `work-item-plugin`

Unit-level testing

`testPredicate`

`testObserver`

`MockContextOptions` knobs

Adversarial matrices

`LoadHarnessOptions`

Available exports from `pi-steering/testing`