Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/cad0p/pi-steering-hooks/llms.txt

Use this file to discover all available pages before exploring further.

pi-steering exports a pi-steering/testing subpath with primitives that exercise the full evaluation pipeline without booting pi. loadHarness builds the same evaluator and observer dispatcher pair that production uses; expectBlocks and expectAllows then drive events through it and assert verdicts in a single call. All tests are deterministic and CI-safe — no pi runtime stub, no file-system walk-up, no live exec unless you explicitly wire one.

Import path

import {
  loadHarness,
  expectBlocks,
  expectAllows,
  testPredicate,
  testObserver,
  runMatrix,
  formatMatrix,
} from "pi-steering/testing";

Harness-level testing

loadHarness accepts a static SteeringConfig and returns a Harness — a build-once, invoke-many handle whose evaluate and dispatch methods are identical in signature to the production runtime’s. Tests that call harness.evaluate directly are invoking exactly the same code path pi does on every tool_call event.
const harness = loadHarness({
  config: { plugins: [myPlugin], rules: [...] },
});

expectBlocks

Assert that the harness returns a block verdict for the given event. Throws on allow.
await expectBlocks(
  harness,
  { command: "git push --force" },
  { rule: "no-force-push" },
);
The second argument accepts bash, write, and edit shorthand forms:
  • { command } — bash tool call
  • { write: { path, content } } — write tool call
  • { edit: { path, edits } } — edit tool call
The optional third argument narrows the assertion. rule checks that the named rule fired (matched against the [steering:<rule>@<source>] prefix in the reason string; the source suffix is ignored). reason accepts an exact string or a RegExp for a looser match. Omit both to assert only that something blocked.

expectAllows

Assert that the harness allows the given event — no rule fires.
await expectAllows(harness, { command: "git push" });
Throws with a rich message — including which rule fired and its full reason — if the event is unexpectedly blocked.

Full example from the work-item-plugin

The canonical plugin example wires a minimal plugin (just the predicate the rule needs) to keep each rule test focused:
import { describe, it } from "node:test";
import { expectAllows, expectBlocks, loadHarness } from "pi-steering/testing";
import type { Plugin } from "pi-steering";
import { workItemFormat } from "../predicates/work-item-format.ts";
import { commitRequiresWorkItem } from "./commit-requires-work-item.ts";

const testPlugin: Plugin = {
  name: "test",
  predicates: { workItemFormat },
};

describe("commit-requires-work-item", () => {
  const harness = loadHarness({
    config: {
      plugins: [testPlugin],
      rules: [commitRequiresWorkItem],
    },
  });

  it("blocks a commit missing the work-item tag", async () => {
    await expectBlocks(
      harness,
      { command: 'git commit -m "feat: add thing"' },
      { rule: "commit-requires-work-item" },
    );
  });

  it("allows a commit containing [PROJ-N]", async () => {
    await expectAllows(harness, {
      command: 'git commit -m "feat: add thing [PROJ-42]"',
    });
  });

  it("allows commits using --message long form", async () => {
    await expectAllows(harness, {
      command: 'git commit --message "fix [PROJ-1] bad thing"',
    });
  });

  it("does NOT fire on git log --grep=\"commit\"", async () => {
    await expectAllows(harness, {
      command: 'git log --grep="commit"',
    });
  });
});

Unit-level testing

For predicates and observers that you want to exercise in complete isolation — without standing up a full harness — the testing subpath provides testPredicate and testObserver.

testPredicate

Build a PredicateContext from MockContextOptions and call the handler directly. Returns the boolean verdict.
const fires = await testPredicate(branch, /^main$/, {
  walkerState: { branch: "main" },
});
Supply only the MockContextOptions fields your predicate reads. The defaults fill in cwd: "/tmp/test", an empty env map, and an exec stub that rejects loudly if accidentally called — so a test that forgets to wire exec fails explicitly rather than silently evaluating against an empty result.

testObserver

Drive an observer’s onResult handler at a synthetic tool_result event and inspect what it appended.
const { entries, watchMatched } = await testObserver(
  myObserver,
  { toolName: "bash", input: { command: "npm test" }, output: {}, exitCode: 0 },
);
watchMatched tells you whether the observer’s watch filter accepted the event — onResult is only called when it does, mirroring production dispatch. entries is the list of appendEntry writes captured from the handler. The npm-test-tracker test suite from the work-item-plugin example demonstrates the full contract:
import { describe, it } from "node:test";
import assert from "node:assert/strict";
import { testObserver } from "pi-steering/testing";
import { TEST_PASSED_EVENT, npmTestTracker } from "./npm-test-tracker.ts";

describe("npm-test-tracker observer", () => {
  it("records a TEST_PASSED_EVENT entry on successful `npm test`", async () => {
    const { entries, watchMatched } = await testObserver(
      npmTestTracker,
      {
        toolName: "bash",
        input: { command: "npm test" },
        output: {},
        exitCode: 0,
      },
    );

    assert.equal(watchMatched, true);
    assert.equal(entries.length, 1);
    assert.equal(entries[0]?.customType, TEST_PASSED_EVENT);
    assert.deepEqual(entries[0]?.data, {
      command: "npm test",
      // The dispatcher auto-tags plain-object payloads with
      // `_agentLoopIndex` — default mockObserverContext has index 0.
      _agentLoopIndex: 0,
    });
  });

  it("does NOT fire on a failed `npm test` (exitCode non-zero)", async () => {
    const { entries, watchMatched } = await testObserver(
      npmTestTracker,
      { toolName: "bash", input: { command: "npm test" }, output: {}, exitCode: 1 },
    );
    assert.equal(watchMatched, false);
    assert.equal(entries.length, 0);
  });

  it("does NOT fire on an unrelated command", async () => {
    const { entries, watchMatched } = await testObserver(
      npmTestTracker,
      { toolName: "bash", input: { command: "npm install" }, output: {}, exitCode: 0 },
    );
    assert.equal(watchMatched, false);
    assert.equal(entries.length, 0);
  });

  it("tags writes with the agent-loop index from ctx", async () => {
    const { entries } = await testObserver(
      npmTestTracker,
      { toolName: "bash", input: { command: "npm test" }, output: {}, exitCode: 0 },
      { agentLoopIndex: 42 },
    );
    const data = entries[0]?.data as { _agentLoopIndex: number };
    assert.equal(data._agentLoopIndex, 42);
  });
});

MockContextOptions knobs

testPredicate (and mockContext directly) accept these options:
cwd
string
default:"/tmp/test"
The effective cwd the predicate sees via ctx.cwd and ctx.walkerState.cwd.
agentLoopIndex
number
default:"0"
The engine’s agent-loop counter. Rules using when.happened: { in: "agent_loop" } scope filtering compare this against _agentLoopIndex on session entries — set it to match the entries you supply via priorEntry.
tool
"bash" | "write" | "edit"
default:"\"bash\""
Which tool the predicate is evaluating under. Drives the default shape of input when no input is supplied.
input
PredicateToolInput
Full tool input. Derived from tool when omitted: bash gives { command: "" }, write gives { path: "", content: "" }, edit gives { path: "", edits: [] }.
walkerState
Partial<WhenWalkerState>
Walker-state snapshot the predicate sees. Merged over { cwd, env: new Map() } — supply only the fields your predicate reads. Pass { branch: "main" } to test a branch predicate without wiring the full git tracker.
exec
(cmd, args, opts?) => ExecResult | Promise<ExecResult>
Stub for ctx.exec. Defaults to rejecting with a clear error — tests that call out to exec must stub explicitly.
entries
ReadonlyArray<MockEntry>
Prior session entries findEntries reads from. Build entries with priorEntry(customType, data, { agentLoopIndex }) so the _agentLoopIndex tag is stamped exactly as the live engine would stamp it — a hand-rolled literal with a typo on the key name silently fails to match when.happened scope filtering.
toolCallEvents
Record<string, readonly SyntheticEntry[]>
Per-ref speculative events for when.happened: { in: "tool_call" } simulation. Keys are the customType event literals; values are synthetic entries — the same shape the walker-level speculative-synthesis pass produces. Use this to test &&-chain allow logic in isolation without a full harness.

Adversarial matrices

An adversarial matrix pins blocking behavior across every shell variant that should trigger a rule — and confirms that false-friend inputs that look like the pattern but should not match are correctly allowed. runMatrix never throws; failures surface in result.cases so all cases are evaluated even when early ones fail.
import { runMatrix, formatMatrix } from "pi-steering/testing";

const result = await runMatrix(harness, [
  { name: "raw",          event: { command: "git push --force" },         expect: "block" },
  { name: "subshell",     event: { command: "sh -c 'git push --force'" }, expect: "block" },
  { name: "sudo",         event: { command: "sudo git push --force" },    expect: "block" },
  { name: "quoted-arg",   event: { command: "git push '--force'" },       expect: "block" },
  { name: "false-friend", event: { command: "echo 'git push --force'" },  expect: "allow" },
]);
console.log(formatMatrix(result));
formatMatrix renders an ASCII-friendly report suited to CI log aggregators:
MATRIX — 5 cases. 5 pass, 0 fail.
================================================================
[raw]          expect:block  actual:BLOCK (no-force-push)
[subshell]     expect:block  actual:BLOCK (no-force-push)
[sudo]         expect:block  actual:BLOCK (no-force-push)
[quoted-arg]   expect:block  actual:BLOCK (no-force-push)
[false-friend] expect:allow  actual:allow
================================================================
PASS: 5/5
Each MatrixCase accepts an expect of "block", "allow", or { block: true, rule: "rule-name" } to also pin which rule fired. The optional cwd field on each case overrides the fallback "/tmp/test" for cwd-scoped rules.
Use matrices to cover the adversarial cases the README highlights — wrappers, subshells, quoted args, and false-friend strings that look like the pattern but shouldn’t match. AST-backed evaluation means echo 'git push --force' does not match a ^git\s+push rule because the pattern tests actual command refs, not substrings of arguments. A matrix that includes the false-friend case documents that guarantee and would catch a regression to substring matching.

LoadHarnessOptions

config
SteeringConfig
required
The SteeringConfig to test with. Passed directly into the merger pipeline — same resolvePlugins + buildEvaluator + buildObserverDispatcher path production uses.
includeDefaults
boolean
default:"false"
Prepend DEFAULT_RULES and DEFAULT_PLUGINS to the config at the innermost position. Mirrors the production !config.disableDefaults flag, but kept explicit here so tests can exercise default rules without editing the config under test.Note: unlike production, loadHarness does not throw on error-class diagnostics. It surfaces them in harness.diagnostics so plugin-author tests can assert on them directly.
host
EvaluatorHost
Custom host to drive exec and appendEntry. Defaults to an in-memory stub whose exec rejects with a clear error and whose appendEntry is a silent sink. Pass a createRecordingHost() instance when a test needs to inspect what the engine wrote across multiple calls — for example, to verify a self-marking onFire rule wrote the correct session entry before the block verdict returned.

Available exports from pi-steering/testing

All primitives are available from the pi-steering/testing subpath. They are also re-exported from the package root for discoverability.
ExportPurpose
loadHarnessBuild an evaluator + dispatcher pair from a static SteeringConfig
expectBlocksAssert an event is blocked; optionally pin which rule and reason
expectAllowsAssert an event is allowed
expectRuleFiresThin alias over expectBlocks for tests focused on which rule fired
runMatrixBatch-evaluate a list of cases; never throws — failures surface in result.cases
formatMatrixRender a MatrixResult as an ASCII-friendly CI report
testPredicateDrive a single PredicateHandler against a mockContext
testObserverDrive an Observer.onResult at a synthetic event; captures appendEntry writes
mockContextBuild a PredicateContext for unit-testing predicates in isolation
mockExtensionContextBuild a minimal ExtensionContext stub backed by a RecordedSessionEntry array
mockObserverContextBuild an ObserverContext for unit-testing observer handlers
priorEntryBuild a MockEntry with the _agentLoopIndex tag stamped correctly
createRecordingHostBuild a recording EvaluatorHost that captures every exec and appendEntry call
getAppendedEntriesRead the appendEntry capture buffer for a mock context

Build docs developers (and LLMs) love