Skip to main content
Trigger testing validates whether your skill activates when it should (and doesn’t activate when it shouldn’t). Skill Lab runs your trigger test cases through a real agent runtime and analyzes execution traces to verify correct skill invocation.

Overview

Trigger testing uses 4 prompt types to ensure robust skill activation:
TypeDescriptionExample
explicitSkill named directly with $ prefix$my-skill do something
implicitDescribes the scenario without naming skillI need to run the test suite
contextualRealistic noisy prompt with domain contextThis React app needs better test coverage. Can you help?
negativeShould NOT trigger (catches false positives)How do I install Python?
Trigger testing requires the Claude CLI (npm install -g @anthropic-ai/claude-code) or Codex CLI. Both runtimes are supported.

Prerequisites

1

Install Claude CLI

Install the Claude Code CLI tool:
npm install -g @anthropic-ai/claude-code
Verify installation:
claude --version
2

Create or generate tests

You need trigger test definitions in .skill-lab/tests/triggers.yaml.Either create them manually or use LLM-powered test generation:
sklab generate ./my-skill

Test Definition Format

Trigger tests are defined in .skill-lab/tests/triggers.yaml:
triggers.yaml
skill: my-skill

test_cases:
  - id: explicit-1
    type: explicit
    prompt: "$my-skill run the test suite"
    expected: trigger

  - id: implicit-1
    type: implicit
    prompt: "I need to run all the tests for this project"
    expected: trigger

  - id: contextual-1
    type: contextual
    prompt: "This React app has some failing tests in the CI pipeline. Can you investigate and fix them?"
    expected: trigger

  - id: negative-1
    type: negative
    prompt: "How do I install Python?"
    expected: no_trigger

Required Fields

FieldTypeDescription
idstringUnique test identifier (e.g., explicit-1)
typeenumTrigger type: explicit, implicit, contextual, or negative
promptstringThe prompt to send to the agent
expectedenumExpected outcome: trigger or no_trigger

Optional Fields

FieldTypeDescription
namestringHuman-readable test name (defaults to id)

Running Trigger Tests

Basic Usage

# Run all trigger tests
sklab trigger ./my-skill

# Run from current directory
sklab trigger
The command will:
  1. Load test cases from .skill-lab/tests/triggers.yaml
  2. Execute each prompt through the Claude runtime
  3. Analyze execution traces for skill invocations
  4. Report pass/fail for each test

Example Output

Trigger Test Report: my-skill
Runtime: claude │ Duration: 12.3s │ 11/13 passed

╭────────────────────────────────────────────────────────╮
│ Test                              Type        Status  │
├───────────────────────────────────────────────────────┤
│ Direct invocation                 explicit    ✓       │
│ Skill name with context           explicit    ✓       │
│ $ prefix invocation               explicit    ✓       │
│ Describe test scenario            implicit    ✓       │
│ Request without naming            implicit    ✓       │
│ Problem statement                 implicit    ✗       │
│ Realistic noisy prompt            contextual  ✓       │
│ Domain-specific context           contextual  ✓       │
│ Multi-step workflow               contextual  ✓       │
│ Unrelated question                negative    ✓       │
│ Similar domain, wrong task        negative    ✓       │
│ Different skill's territory       negative    ✗       │
│ Edge case confusion               negative    ✓       │
╰────────────────────────────────────────────────────────╯

By type: explicit: 3/3 (100%) │ implicit: 2/3 (67%) │ contextual: 3/3 (100%) │ negative: 3/4 (75%)

Filter by Trigger Type

Run only tests of a specific type:
# Test only explicit triggers
sklab trigger --type explicit

# Test only negative cases (false positive detection)
sklab trigger --type negative

# Test implicit activation
sklab trigger --type implicit
Valid types: explicit, implicit, contextual, negative

JSON Output

Generate machine-readable JSON output:
# Print to stdout
sklab trigger --format json

# Save to file
sklab trigger --output report.json --format json
{
  "skill_path": "/path/to/my-skill",
  "skill_name": "my-skill",
  "timestamp": "2026-03-03T14:30:00Z",
  "duration_ms": 12345.6,
  "runtime": "claude",
  "tests_run": 13,
  "tests_passed": 11,
  "tests_failed": 2,
  "overall_pass": false,
  "pass_rate": 0.846,
  "results": [
    {
      "test_id": "explicit-1",
      "test_name": "Direct invocation",
      "trigger_type": "explicit",
      "passed": true,
      "skill_triggered": true,
      "expected_trigger": true,
      "message": "Test passed: Direct invocation",
      "trace_path": "/path/to/.skill-lab/traces/explicit-1.jsonl",
      "events_count": 42,
      "exit_code": 0
    }
  ],
  "summary_by_type": {
    "explicit": {"passed": 3, "failed": 0, "total": 3},
    "implicit": {"passed": 2, "failed": 1, "total": 3},
    "contextual": {"passed": 3, "failed": 0, "total": 3},
    "negative": {"passed": 3, "failed": 1, "total": 4}
  }
}

How It Works

1

Test execution

For each test case, Skill Lab:
  1. Launches the Claude CLI with the test prompt
  2. Records the execution trace to .skill-lab/traces/{test-id}.jsonl
  3. Captures the agent’s tool calls and skill invocations
2

Trace analysis

The trace analyzer examines the JSONL trace file:
  • Scans for skill invocation events
  • Identifies which skills were loaded
  • Tracks the order of tool calls
3

Result validation

Compares actual behavior against expected outcome:
  • Positive tests (expected: trigger): Skill must be invoked
  • Negative tests (expected: no_trigger): Skill must NOT be invoked
Execution traces are saved to .skill-lab/traces/ for debugging. If a test fails, examine the trace file to understand why the skill was (or wasn’t) triggered.

Best Practices

Write Diverse Test Cases

Cover all 4 trigger types for comprehensive validation:
  • Explicit (3+ tests): Direct invocations with variations
  • Implicit (3+ tests): Scenario descriptions without naming the skill
  • Contextual (3+ tests): Realistic prompts with noise and domain context
  • Negative (4+ tests): Adjacent tasks that should NOT trigger

Test Edge Cases

Include negative tests for:
  • Similar domains but different tasks
  • Ambiguous prompts that could match multiple skills
  • Common questions unrelated to your skill’s purpose
Example: Testing Boundaries
# Good negative test: similar domain, wrong task
- id: negative-boundary-1
  type: negative
  prompt: "I need to deploy the application" # deployment, not testing
  expected: no_trigger

# Good negative test: unrelated domain
- id: negative-unrelated-1
  type: negative
  prompt: "How do I configure my database?"
  expected: no_trigger

Optimize Test Runtime

Skill Lab automatically stops execution when the expected skill is triggered (for positive tests). This reduces runtime and API costs.
Trigger testing incurs LLM API costs. Each test runs a full agent session. Use --type filters during development to test specific trigger types.

Interpreting Results

Pass Rates by Type

Pass RateQuality LevelAction
100%ExcellentReady for production
80-99%GoodReview failed tests, improve activation logic
60-79%FairSignificant false positives/negatives
< 60%PoorSkill trigger logic needs redesign

Common Failure Patterns

Implicit tests failing: Skill description may be too narrow or unclear
  • Update skill description to cover implicit scenarios
  • Add more examples to SKILL.md
Negative tests failing: Skill is over-triggering (false positives)
  • Refine skill description to be more specific
  • Add explicit prerequisites or constraints
Contextual tests failing: Skill doesn’t handle noisy, realistic prompts
  • Add domain keywords to skill description
  • Include contextual examples in SKILL.md

CI/CD Integration

name: Trigger Tests
on: [push, pull_request]

jobs:
  trigger-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install dependencies
        run: |
          npm install -g @anthropic-ai/claude-code
          pip install skill-lab
      
      - name: Run trigger tests
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: sklab trigger ./my-skill --format json --output results.json
      
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: trigger-test-results
          path: results.json

Next Steps

Test Generation

Auto-generate trigger tests using LLMs instead of writing them manually

Static Analysis

Validate skill structure and content quality

Build docs developers (and LLMs) love