LLM Test Generation

Skill Lab can automatically generate trigger test cases for your skills using the Anthropic API. The sklab generate command reads your SKILL.md and produces ~13 test cases across all 4 trigger types (explicit, implicit, contextual, negative).

Overview

LLM-powered test generation:

Reads your skill’s name, description, and markdown content
Generates realistic test prompts for each trigger type
Creates .skill-lab/tests/triggers.yaml ready for execution
Shows token usage and cost estimates

Test generation requires the anthropic package. Install with: pip install skill-lab[generate]

Prerequisites

Install optional dependency

Install Skill Lab with the generate extra:

pip install skill-lab[generate]

This installs the anthropic SDK (v0.39.0+).

Set API key

Export your Anthropic API key:

export ANTHROPIC_API_KEY=sk-ant-...

Get your key at console.anthropic.com.

Basic Usage

Generate Tests

# Generate tests for a skill
sklab generate ./my-skill

# Generate for current directory
sklab generate

The command will:

Read SKILL.md (name, description, body content)
Call the Anthropic API to generate test cases
Write .skill-lab/tests/triggers.yaml
Display token usage and cost

Example Output

Generating trigger tests...

Generated 13 trigger tests:
  contextual: 3
  explicit: 3
  implicit: 3
  negative: 4

Tokens: 1,234 in + 567 out = 1,801 ($0.0032)

Written to: /path/to/my-skill/.skill-lab/tests/triggers.yaml
Run sklab trigger to execute them.

After generation, review the generated tests in .skill-lab/tests/triggers.yaml. You can edit them manually to add domain-specific test cases.

Command Options

Specify Model

Use a specific Anthropic model:

# Use Sonnet for higher quality tests
sklab generate --model claude-sonnet-4-5-20250929

# Use Haiku (default, faster and cheaper)
sklab generate --model claude-haiku-4-5-20251001

Supported models:

claude-haiku-4-5-20251001 (default): Fast, cheap, good quality
claude-sonnet-4-5-20250929: Higher quality, more expensive
claude-opus-4-6: Highest quality, most expensive

Set Default Model

Set a global default model via environment variable:

export SKLAB_MODEL=claude-sonnet-4-5-20250929
sklab generate ./my-skill

Precedence: --model flag > SKLAB_MODEL env var > default (Haiku)

Force Overwrite

Overwrite existing test file without prompting:

sklab generate --force

By default, sklab generate prompts before overwriting .skill-lab/tests/triggers.yaml.

Generated Test Structure

The generated triggers.yaml file contains ~13 test cases:

skill: my-testing-skill

test_cases:
  # Explicit triggers (3 tests)
  - id: explicit-1
    type: explicit
    prompt: "$my-testing-skill run all tests"
    expected: trigger

  - id: explicit-2
    type: explicit
    prompt: "$my-testing-skill execute the test suite for this project"
    expected: trigger

  - id: explicit-3
    type: explicit
    prompt: "Use $my-testing-skill to run pytest"
    expected: trigger

  # Implicit triggers (3 tests)
  - id: implicit-1
    type: implicit
    prompt: "I need to run the test suite"
    expected: trigger

  - id: implicit-2
    type: implicit
    prompt: "Can you execute all the tests for this project?"
    expected: trigger

  - id: implicit-3
    type: implicit
    prompt: "Run pytest on the codebase"
    expected: trigger

  # Contextual triggers (3 tests)
  - id: contextual-1
    type: contextual
    prompt: "This Python project uses pytest. Some tests are failing in CI. Can you run them locally?"
    expected: trigger

  - id: contextual-2
    type: contextual
    prompt: "I just refactored the authentication module. Let's make sure nothing broke by running the test suite."
    expected: trigger

  - id: contextual-3
    type: contextual
    prompt: "Before deploying this API, I want to validate that all edge cases are covered. Run the tests."
    expected: trigger

  # Negative triggers (4 tests)
  - id: negative-1
    type: negative
    prompt: "How do I install pytest?"
    expected: no_trigger

  - id: negative-2
    type: negative
    prompt: "Write new unit tests for the payment processing module"
    expected: no_trigger

  - id: negative-3
    type: negative
    prompt: "Explain how to configure pytest.ini"
    expected: no_trigger

  - id: negative-4
    type: negative
    prompt: "What's the difference between pytest and unittest?"
    expected: no_trigger

Test Distribution

Type	Count	Purpose
explicit	3	Direct `$skill-name` invocations
implicit	3	Scenario descriptions without naming skill
contextual	3	Realistic noisy prompts with domain context
negative	4	Adjacent requests that should NOT trigger

How It Works

Skill content extraction

Skill Lab reads your SKILL.md:

Extracts name and description from YAML frontmatter
Reads the markdown body (up to 4,000 characters)

LLM prompt construction

Builds a system prompt with:

Instructions to generate trigger tests
Expected YAML schema
Examples of good test cases

API call

Calls the Anthropic API with:

System prompt: Instructions for test generation
User message: Your skill’s name, description, and content
Model: Default (claude-haiku-4-5-20251001) or specified via --model

Response parsing

Parses and validates the LLM response:

Strips markdown code fences (if present)
Parses YAML structure
Validates required fields (id, type, prompt, expected)
Forces correct skill name

File writing

Writes .skill-lab/tests/triggers.yaml:

Creates .skill-lab/tests/ directory if needed
Prompts before overwriting (unless --force)

Token Usage and Pricing

Token Estimates

Skill Lab displays token usage after generation:

Tokens: 1,234 in + 567 out = 1,801 ($0.0032)

Input tokens: System prompt + your skill content
Output tokens: Generated YAML test cases
Cost: Calculated using current Anthropic pricing

Pricing (as of Feb 2025)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Typical Cost
Haiku (default)	$0.80	$4.00	$0.002 -$ 0.005
Sonnet	$3.00	$15.00	$0.008 -$ 0.015
Opus	$15.00	$75.00	$0.040 -$ 0.080

Pricing is embedded in src/skill_lab/triggers/generator.py and may change. Check Anthropic pricing for current rates.

Customizing Generated Tests

After generation, you can edit .skill-lab/tests/triggers.yaml manually:

Add Domain-Specific Tests

# Add your own test cases
- id: explicit-custom-1
  type: explicit
  prompt: "$my-skill with custom domain-specific context"
  expected: trigger

Refine Negative Tests

Negative tests are crucial for avoiding false positives:

# Test boundary between your skill and adjacent skills
- id: negative-boundary-1
  type: negative
  prompt: "I need to deploy the application"  # deployment, not testing
  expected: no_trigger

Adjust for Prerequisites

If your skill has prerequisites, add tests that check activation logic:

# Should NOT trigger if prerequisites aren't met
- id: negative-prereq-1
  type: negative
  prompt: "Run tests on this JavaScript project"  # assumes Python skill
  expected: no_trigger

Best Practices

Write Clear Skill Descriptions

The quality of generated tests depends on your skill’s description:

---
name: run-python-tests
description: >
  Execute pytest test suite for Python projects.
  Discovers and runs all tests in tests/ directory,
  generates coverage reports, and displays results.
  Requires pytest installed and pytest.ini configured.
---

Clear descriptions help the LLM generate:

More realistic implicit/contextual prompts
Better negative tests (understanding boundaries)
Domain-specific vocabulary

Review Generated Tests

Always review the generated tests before running them:

Check for duplicate or redundant tests
Verify negative tests cover adjacent domains
Ensure contextual tests are realistic
Add missing edge cases

Iterate and Regenerate

If the initial generation is poor:

Improve your skill description in SKILL.md
Add more examples to the skill body
Regenerate: sklab generate --force

Troubleshooting

”ANTHROPIC_API_KEY environment variable is not set”

Solution: Export your API key:

export ANTHROPIC_API_KEY=sk-ant-...

Or add to your shell profile (~/.bashrc, ~/.zshrc):

echo 'export ANTHROPIC_API_KEY=sk-ant-...' >> ~/.bashrc
source ~/.bashrc

“The ‘anthropic’ package is required for test generation”

Solution: Install the optional dependency:

pip install skill-lab[generate]

Generated Tests Are Low Quality

Solutions:

Use a better model: --model claude-sonnet-4-5-20250929
Improve skill description: Add more context and examples to SKILL.md
Manually edit: Refine the generated tests in .skill-lab/tests/triggers.yaml

”Failed to parse generated YAML”

The LLM returned invalid YAML. Solutions:

Try regenerating: sklab generate --force
Switch to Sonnet: --model claude-sonnet-4-5-20250929
If persistent, file a bug report with your skill content

Workflow Example

Create skill

Write your SKILL.md with clear name, description, and examples:

mkdir my-skill
cd my-skill
# ... create SKILL.md ...

Generate tests

Auto-generate trigger tests:

sklab generate

Review and refine

Review .skill-lab/tests/triggers.yaml and add custom tests:

cat .skill-lab/tests/triggers.yaml
# Edit manually to add domain-specific tests

Run tests

Execute the generated tests:

sklab trigger

Iterate

If tests fail or are low quality:

Update SKILL.md description
Regenerate: sklab generate --force
Re-run: sklab trigger

Next Steps

Trigger Testing

Learn how to run and interpret trigger test results

Static Analysis

Validate skill structure and content quality

Get Started

Guides

Core Concepts

Development

LLM Test Generation

Overview

Prerequisites

Basic Usage

Generate Tests

Example Output

Command Options

Specify Model

Set Default Model

Force Overwrite

Generated Test Structure

Test Distribution

How It Works

Token Usage and Pricing

Token Estimates

Pricing (as of Feb 2025)

Customizing Generated Tests

Add Domain-Specific Tests

Refine Negative Tests

Adjust for Prerequisites

Best Practices

Write Clear Skill Descriptions

Review Generated Tests

Iterate and Regenerate

Troubleshooting

”ANTHROPIC_API_KEY environment variable is not set”

“The ‘anthropic’ package is required for test generation”

Generated Tests Are Low Quality

”Failed to parse generated YAML”

Workflow Example

Next Steps

Trigger Testing

Static Analysis

Build docs developers (and LLMs) love

Get Started

Guides

Core Concepts

Development

​Overview

​Prerequisites

​Basic Usage

​Generate Tests

​Example Output

​Command Options

​Specify Model

​Set Default Model

​Force Overwrite

​Generated Test Structure

​Test Distribution

​How It Works

​Token Usage and Pricing

​Token Estimates

​Pricing (as of Feb 2025)

​Customizing Generated Tests

​Add Domain-Specific Tests

​Refine Negative Tests

​Adjust for Prerequisites

​Best Practices

​Write Clear Skill Descriptions

​Review Generated Tests

​Iterate and Regenerate

​Troubleshooting

​”ANTHROPIC_API_KEY environment variable is not set”

​“The ‘anthropic’ package is required for test generation”

​Generated Tests Are Low Quality

​”Failed to parse generated YAML”

​Workflow Example

​Next Steps

Trigger Testing

Static Analysis

Build docs developers (and LLMs) love

Overview

Prerequisites

Basic Usage

Generate Tests

Example Output

Command Options

Specify Model

Set Default Model

Force Overwrite

Generated Test Structure

Test Distribution

How It Works

Token Usage and Pricing

Token Estimates

Pricing (as of Feb 2025)

Customizing Generated Tests

Add Domain-Specific Tests

Refine Negative Tests

Adjust for Prerequisites

Best Practices

Write Clear Skill Descriptions

Review Generated Tests

Iterate and Regenerate

Troubleshooting

”ANTHROPIC_API_KEY environment variable is not set”

“The ‘anthropic’ package is required for test generation”

Generated Tests Are Low Quality

”Failed to parse generated YAML”

Workflow Example

Next Steps