Skip to main content

Prerequisites

Make sure you have Python 3.10+ and Skill Lab installed.

Your First Evaluation

Let’s evaluate a skill using Skill Lab’s static analysis.
1

Create a sample skill

Create a directory with a SKILL.md file:
mkdir my-skill
cd my-skill
Create SKILL.md with this content:
SKILL.md
---
name: my-skill
description: A sample skill that demonstrates basic functionality.
---

# My Skill

This skill helps with sample tasks.

## Examples

```python
def hello():
    return "Hello from my-skill!"

Usage

Use this skill when you need to demonstrate basic functionality.
</Step>

<Step title="Run your first evaluation">
From the skill directory, run:

```bash
sklab evaluate
You’ll see a rich-formatted report with:
  • Quality score (0-100)
  • Check results by dimension
  • Passed and failed checks
  • Suggestions for improvement
The sklab evaluate command defaults to the current directory. You can also specify a path: sklab evaluate ./my-skill
2

Review the results

The output shows:
  • Quality Score: Overall score based on weighted checks
  • Dimension Summary: Results grouped by Structure, Naming, Description, Content
  • Failed Checks: Detailed messages for any issues found
  • Suggestions: Recommendations for quality improvements

Common Commands

Now that you’ve run your first evaluation, try these commands:
# Returns exit code 0 if valid, 1 if invalid
sklab validate ./my-skill

Understanding the Output

Here’s what the evaluation report includes:
A weighted 0-100 score based on check results:
  • 90-100: Excellent quality
  • 70-89: Good quality, minor improvements needed
  • 50-69: Fair quality, several issues to address
  • Below 50: Needs significant improvement
Checks are organized into 4 dimensions:
  • Structure: File existence, frontmatter format, standard fields
  • Naming: Skill name format, directory matching
  • Description: Required fields, max length, non-empty
  • Content: Examples, line budget, reference validation
Each check has a severity:
  • Error: Must fix for spec compliance
  • Warning: Important quality issues
  • Info: Suggestions for best practices

Advanced: Generate Trigger Tests

This requires Anthropic API key setup and the generate extra.
Generate LLM-powered trigger test cases:
1

Generate test cases

sklab generate ./my-skill
This creates .skill-lab/tests/triggers.yaml with ~13 test cases across 4 trigger types.
2

Review the tests

Open .skill-lab/tests/triggers.yaml to see:
  • Explicit triggers (direct skill name)
  • Implicit triggers (need description)
  • Contextual triggers (realistic prompts)
  • Negative triggers (should NOT activate)
3

Run the tests (optional)

Requires Claude CLI to be installed.
sklab trigger ./my-skill

Practical Examples

Use in GitHub Actions or other CI:
.github/workflows/validate-skills.yml
name: Validate Skills
on: [push, pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      - run: pip install skill-lab
      - run: sklab validate ./skills

Next Steps

Now that you’ve run your first evaluation, explore more features:

Static Analysis Guide

Deep dive into the 19 static checks

Trigger Testing Guide

Learn about trigger testing in detail

Quality Scoring

Understand how scores are calculated

Output Formats

Master console and JSON output

Skill Format

Learn the SKILL.md format specification

Check Catalog

Browse all available checks

Common Issues

Make sure you’re in a directory with a SKILL.md file, or specify the path:
sklab evaluate ./path/to/skill
Ensure your frontmatter is valid YAML:
---
name: my-skill
description: This is a description.
---
Common issues:
  • Missing closing ---
  • Invalid YAML syntax
  • Non-string field values
Run with --verbose to see all checks:
sklab evaluate ./my-skill --verbose
This shows both passing and failing checks, helping you identify improvements.
Pro tip: Use sklab list-checks --spec-only to see only the checks required by the Agent Skills spec, excluding quality suggestions.

Build docs developers (and LLMs) love