Skip to main content
Skill Lab can automatically generate trigger test cases for your skills using the Anthropic API. The sklab generate command reads your SKILL.md and produces ~13 test cases across all 4 trigger types (explicit, implicit, contextual, negative).

Overview

LLM-powered test generation:
  • Reads your skill’s name, description, and markdown content
  • Generates realistic test prompts for each trigger type
  • Creates .skill-lab/tests/triggers.yaml ready for execution
  • Shows token usage and cost estimates
Test generation requires the anthropic package. Install with: pip install skill-lab[generate]

Prerequisites

1

Install optional dependency

Install Skill Lab with the generate extra:
pip install skill-lab[generate]
This installs the anthropic SDK (v0.39.0+).
2

Set API key

Export your Anthropic API key:
export ANTHROPIC_API_KEY=sk-ant-...
Get your key at console.anthropic.com.

Basic Usage

Generate Tests

# Generate tests for a skill
sklab generate ./my-skill

# Generate for current directory
sklab generate
The command will:
  1. Read SKILL.md (name, description, body content)
  2. Call the Anthropic API to generate test cases
  3. Write .skill-lab/tests/triggers.yaml
  4. Display token usage and cost

Example Output

Generating trigger tests...

Generated 13 trigger tests:
  contextual: 3
  explicit: 3
  implicit: 3
  negative: 4

Tokens: 1,234 in + 567 out = 1,801 ($0.0032)

Written to: /path/to/my-skill/.skill-lab/tests/triggers.yaml
Run sklab trigger to execute them.
After generation, review the generated tests in .skill-lab/tests/triggers.yaml. You can edit them manually to add domain-specific test cases.

Command Options

Specify Model

Use a specific Anthropic model:
# Use Sonnet for higher quality tests
sklab generate --model claude-sonnet-4-5-20250929

# Use Haiku (default, faster and cheaper)
sklab generate --model claude-haiku-4-5-20251001
Supported models:
  • claude-haiku-4-5-20251001 (default): Fast, cheap, good quality
  • claude-sonnet-4-5-20250929: Higher quality, more expensive
  • claude-opus-4-6: Highest quality, most expensive

Set Default Model

Set a global default model via environment variable:
export SKLAB_MODEL=claude-sonnet-4-5-20250929
sklab generate ./my-skill
Precedence: --model flag > SKLAB_MODEL env var > default (Haiku)

Force Overwrite

Overwrite existing test file without prompting:
sklab generate --force
By default, sklab generate prompts before overwriting .skill-lab/tests/triggers.yaml.

Generated Test Structure

The generated triggers.yaml file contains ~13 test cases:
skill: my-testing-skill

test_cases:
  # Explicit triggers (3 tests)
  - id: explicit-1
    type: explicit
    prompt: "$my-testing-skill run all tests"
    expected: trigger

  - id: explicit-2
    type: explicit
    prompt: "$my-testing-skill execute the test suite for this project"
    expected: trigger

  - id: explicit-3
    type: explicit
    prompt: "Use $my-testing-skill to run pytest"
    expected: trigger

  # Implicit triggers (3 tests)
  - id: implicit-1
    type: implicit
    prompt: "I need to run the test suite"
    expected: trigger

  - id: implicit-2
    type: implicit
    prompt: "Can you execute all the tests for this project?"
    expected: trigger

  - id: implicit-3
    type: implicit
    prompt: "Run pytest on the codebase"
    expected: trigger

  # Contextual triggers (3 tests)
  - id: contextual-1
    type: contextual
    prompt: "This Python project uses pytest. Some tests are failing in CI. Can you run them locally?"
    expected: trigger

  - id: contextual-2
    type: contextual
    prompt: "I just refactored the authentication module. Let's make sure nothing broke by running the test suite."
    expected: trigger

  - id: contextual-3
    type: contextual
    prompt: "Before deploying this API, I want to validate that all edge cases are covered. Run the tests."
    expected: trigger

  # Negative triggers (4 tests)
  - id: negative-1
    type: negative
    prompt: "How do I install pytest?"
    expected: no_trigger

  - id: negative-2
    type: negative
    prompt: "Write new unit tests for the payment processing module"
    expected: no_trigger

  - id: negative-3
    type: negative
    prompt: "Explain how to configure pytest.ini"
    expected: no_trigger

  - id: negative-4
    type: negative
    prompt: "What's the difference between pytest and unittest?"
    expected: no_trigger

Test Distribution

TypeCountPurpose
explicit3Direct $skill-name invocations
implicit3Scenario descriptions without naming skill
contextual3Realistic noisy prompts with domain context
negative4Adjacent requests that should NOT trigger

How It Works

1

Skill content extraction

Skill Lab reads your SKILL.md:
  • Extracts name and description from YAML frontmatter
  • Reads the markdown body (up to 4,000 characters)
2

LLM prompt construction

Builds a system prompt with:
  • Instructions to generate trigger tests
  • Expected YAML schema
  • Examples of good test cases
3

API call

Calls the Anthropic API with:
  • System prompt: Instructions for test generation
  • User message: Your skill’s name, description, and content
  • Model: Default (claude-haiku-4-5-20251001) or specified via --model
4

Response parsing

Parses and validates the LLM response:
  • Strips markdown code fences (if present)
  • Parses YAML structure
  • Validates required fields (id, type, prompt, expected)
  • Forces correct skill name
5

File writing

Writes .skill-lab/tests/triggers.yaml:
  • Creates .skill-lab/tests/ directory if needed
  • Prompts before overwriting (unless --force)

Token Usage and Pricing

Token Estimates

Skill Lab displays token usage after generation:
Tokens: 1,234 in + 567 out = 1,801 ($0.0032)
  • Input tokens: System prompt + your skill content
  • Output tokens: Generated YAML test cases
  • Cost: Calculated using current Anthropic pricing

Pricing (as of Feb 2025)

ModelInput (per 1M tokens)Output (per 1M tokens)Typical Cost
Haiku (default)$0.80$4.000.0020.002 - 0.005
Sonnet$3.00$15.000.0080.008 - 0.015
Opus$15.00$75.000.0400.040 - 0.080
Pricing is embedded in src/skill_lab/triggers/generator.py and may change. Check Anthropic pricing for current rates.

Customizing Generated Tests

After generation, you can edit .skill-lab/tests/triggers.yaml manually:

Add Domain-Specific Tests

# Add your own test cases
- id: explicit-custom-1
  type: explicit
  prompt: "$my-skill with custom domain-specific context"
  expected: trigger

Refine Negative Tests

Negative tests are crucial for avoiding false positives:
# Test boundary between your skill and adjacent skills
- id: negative-boundary-1
  type: negative
  prompt: "I need to deploy the application"  # deployment, not testing
  expected: no_trigger

Adjust for Prerequisites

If your skill has prerequisites, add tests that check activation logic:
# Should NOT trigger if prerequisites aren't met
- id: negative-prereq-1
  type: negative
  prompt: "Run tests on this JavaScript project"  # assumes Python skill
  expected: no_trigger

Best Practices

Write Clear Skill Descriptions

The quality of generated tests depends on your skill’s description:
---
name: run-python-tests
description: >
  Execute pytest test suite for Python projects.
  Discovers and runs all tests in tests/ directory,
  generates coverage reports, and displays results.
  Requires pytest installed and pytest.ini configured.
---
Clear descriptions help the LLM generate:
  • More realistic implicit/contextual prompts
  • Better negative tests (understanding boundaries)
  • Domain-specific vocabulary

Review Generated Tests

Always review the generated tests before running them:
  1. Check for duplicate or redundant tests
  2. Verify negative tests cover adjacent domains
  3. Ensure contextual tests are realistic
  4. Add missing edge cases

Iterate and Regenerate

If the initial generation is poor:
  1. Improve your skill description in SKILL.md
  2. Add more examples to the skill body
  3. Regenerate: sklab generate --force

Troubleshooting

”ANTHROPIC_API_KEY environment variable is not set”

Solution: Export your API key:
export ANTHROPIC_API_KEY=sk-ant-...
Or add to your shell profile (~/.bashrc, ~/.zshrc):
echo 'export ANTHROPIC_API_KEY=sk-ant-...' >> ~/.bashrc
source ~/.bashrc

“The ‘anthropic’ package is required for test generation”

Solution: Install the optional dependency:
pip install skill-lab[generate]

Generated Tests Are Low Quality

Solutions:
  1. Use a better model: --model claude-sonnet-4-5-20250929
  2. Improve skill description: Add more context and examples to SKILL.md
  3. Manually edit: Refine the generated tests in .skill-lab/tests/triggers.yaml

”Failed to parse generated YAML”

The LLM returned invalid YAML. Solutions:
  1. Try regenerating: sklab generate --force
  2. Switch to Sonnet: --model claude-sonnet-4-5-20250929
  3. If persistent, file a bug report with your skill content

Workflow Example

1

Create skill

Write your SKILL.md with clear name, description, and examples:
mkdir my-skill
cd my-skill
# ... create SKILL.md ...
2

Generate tests

Auto-generate trigger tests:
sklab generate
3

Review and refine

Review .skill-lab/tests/triggers.yaml and add custom tests:
cat .skill-lab/tests/triggers.yaml
# Edit manually to add domain-specific tests
4

Run tests

Execute the generated tests:
sklab trigger
5

Iterate

If tests fail or are low quality:
  1. Update SKILL.md description
  2. Regenerate: sklab generate --force
  3. Re-run: sklab trigger

Next Steps

Trigger Testing

Learn how to run and interpret trigger test results

Static Analysis

Validate skill structure and content quality

Build docs developers (and LLMs) love