sklab generate command reads your SKILL.md and produces ~13 test cases across all 4 trigger types (explicit, implicit, contextual, negative).
Overview
LLM-powered test generation:- Reads your skill’s name, description, and markdown content
- Generates realistic test prompts for each trigger type
- Creates
.skill-lab/tests/triggers.yamlready for execution - Shows token usage and cost estimates
Test generation requires the
anthropic package.
Install with: pip install skill-lab[generate]Prerequisites
Install optional dependency
Install Skill Lab with the This installs the
generate extra:anthropic SDK (v0.39.0+).Set API key
Basic Usage
Generate Tests
- Read
SKILL.md(name, description, body content) - Call the Anthropic API to generate test cases
- Write
.skill-lab/tests/triggers.yaml - Display token usage and cost
Example Output
Command Options
Specify Model
Use a specific Anthropic model:claude-haiku-4-5-20251001(default): Fast, cheap, good qualityclaude-sonnet-4-5-20250929: Higher quality, more expensiveclaude-opus-4-6: Highest quality, most expensive
Set Default Model
Set a global default model via environment variable:--model flag > SKLAB_MODEL env var > default (Haiku)
Force Overwrite
Overwrite existing test file without prompting:sklab generate prompts before overwriting .skill-lab/tests/triggers.yaml.
Generated Test Structure
The generatedtriggers.yaml file contains ~13 test cases:
Test Distribution
| Type | Count | Purpose |
|---|---|---|
| explicit | 3 | Direct $skill-name invocations |
| implicit | 3 | Scenario descriptions without naming skill |
| contextual | 3 | Realistic noisy prompts with domain context |
| negative | 4 | Adjacent requests that should NOT trigger |
How It Works
Skill content extraction
Skill Lab reads your SKILL.md:
- Extracts
nameanddescriptionfrom YAML frontmatter - Reads the markdown body (up to 4,000 characters)
LLM prompt construction
Builds a system prompt with:
- Instructions to generate trigger tests
- Expected YAML schema
- Examples of good test cases
API call
Calls the Anthropic API with:
- System prompt: Instructions for test generation
- User message: Your skill’s name, description, and content
- Model: Default (
claude-haiku-4-5-20251001) or specified via--model
Response parsing
Parses and validates the LLM response:
- Strips markdown code fences (if present)
- Parses YAML structure
- Validates required fields (
id,type,prompt,expected) - Forces correct skill name
Token Usage and Pricing
Token Estimates
Skill Lab displays token usage after generation:- Input tokens: System prompt + your skill content
- Output tokens: Generated YAML test cases
- Cost: Calculated using current Anthropic pricing
Pricing (as of Feb 2025)
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Typical Cost |
|---|---|---|---|
| Haiku (default) | $0.80 | $4.00 | 0.005 |
| Sonnet | $3.00 | $15.00 | 0.015 |
| Opus | $15.00 | $75.00 | 0.080 |
Pricing is embedded in
src/skill_lab/triggers/generator.py and may change.
Check Anthropic pricing for current rates.Customizing Generated Tests
After generation, you can edit.skill-lab/tests/triggers.yaml manually:
Add Domain-Specific Tests
Refine Negative Tests
Negative tests are crucial for avoiding false positives:Adjust for Prerequisites
If your skill has prerequisites, add tests that check activation logic:Best Practices
Write Clear Skill Descriptions
The quality of generated tests depends on your skill’s description:- More realistic implicit/contextual prompts
- Better negative tests (understanding boundaries)
- Domain-specific vocabulary
Review Generated Tests
Always review the generated tests before running them:- Check for duplicate or redundant tests
- Verify negative tests cover adjacent domains
- Ensure contextual tests are realistic
- Add missing edge cases
Iterate and Regenerate
If the initial generation is poor:- Improve your skill description in SKILL.md
- Add more examples to the skill body
- Regenerate:
sklab generate --force
Troubleshooting
”ANTHROPIC_API_KEY environment variable is not set”
Solution: Export your API key:~/.bashrc, ~/.zshrc):
“The ‘anthropic’ package is required for test generation”
Solution: Install the optional dependency:Generated Tests Are Low Quality
Solutions:- Use a better model:
--model claude-sonnet-4-5-20250929 - Improve skill description: Add more context and examples to SKILL.md
- Manually edit: Refine the generated tests in
.skill-lab/tests/triggers.yaml
”Failed to parse generated YAML”
The LLM returned invalid YAML. Solutions:- Try regenerating:
sklab generate --force - Switch to Sonnet:
--model claude-sonnet-4-5-20250929 - If persistent, file a bug report with your skill content
Workflow Example
Next Steps
Trigger Testing
Learn how to run and interpret trigger test results
Static Analysis
Validate skill structure and content quality