Testing Guide

This guide covers testing practices, conventions, and patterns used in Skill Lab.

Test Organization

Test Structure

Tests are organized in the tests/ directory, mirroring the source code structure:

tests/
├── __init__.py
├── conftest.py              # Shared fixtures and configuration
├── test_checks.py           # Static check tests (28 checks)
├── test_cli.py              # CLI command tests
├── test_evaluator.py        # Evaluator tests
├── test_generator.py        # LLM-based test generation
├── test_parsers.py          # Parser tests
├── test_prompt_exporter.py  # Prompt export tests
├── test_scoring.py          # Scoring algorithm tests
├── test_tokens.py           # Token estimation tests
├── test_trace_check_loader.py    # Trace check loader tests
├── test_trace_evaluator.py       # Trace evaluator tests
├── test_trace_handlers.py        # Trace handler tests
├── test_triggers.py              # Trigger testing tests
└── fixtures/
    └── skills/              # Mock skill directories
        ├── valid-skill/
        ├── missing-frontmatter/
        └── invalid-yaml/

Fixtures

Test fixtures are organized in tests/fixtures/skills/. Each subdirectory represents a complete skill with a SKILL.md file and optional supporting files.

Creating Test Fixtures

# tests/conftest.py
import pytest
from pathlib import Path

@pytest.fixture
def fixtures_dir() -> Path:
    """Return path to fixtures directory."""
    return Path(__file__).parent / "fixtures" / "skills"

@pytest.fixture
def valid_skill(fixtures_dir: Path) -> Path:
    """Return path to a valid skill fixture."""
    return fixtures_dir / "valid-skill"

Running Tests

Basic Commands

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=skill_lab --cov-report=html

# Run specific test file
pytest tests/test_checks.py -v

# Run specific test function
pytest tests/test_checks.py::test_skill_md_exists -v

# Run tests matching a pattern
pytest tests/ -k "schema" -v

Coverage Reports

Generate and view coverage reports:

# Generate HTML coverage report
pytest tests/ --cov=skill_lab --cov-report=html

# Open the report in your browser
open htmlcov/index.html  # macOS
xdg-open htmlcov/index.html  # Linux
start htmlcov/index.html  # Windows

The coverage configuration in pyproject.toml excludes common patterns like __repr__, if TYPE_CHECKING:, and if __name__ == "__main__":.

Testing Patterns

Testing Static Checks

There are two patterns for testing static checks, depending on the check type:

1. Schema-Based Checks (Registry Lookup)

For checks defined in schema.py using FieldRule:

from skill_lab.core.registry import registry

def _get_check(check_id: str):
    """Helper to get a check class by ID from the registry."""
    check_class = registry.get(check_id)
    if check_class is None:
        raise ValueError(f"Check not found: {check_id}")
    return check_class

def test_schema_check():
    # Get check from registry
    check_class = _get_check("schema.name-exists")
    
    # Create skill fixture
    skill = create_skill_with_metadata({"description": "Test"})
    
    # Run check
    result = check_class().run(skill)
    
    # Assertions
    assert not result.passed
    assert "name" in result.message.lower()

2. Behavioral Checks (Direct Import)

For checks with custom implementation:

from skill_lab.checks.static.naming import NameMatchesDirectoryCheck
from skill_lab.core.models import Skill, SkillMetadata
from pathlib import Path

def test_name_matches_directory():
    # Create skill with matching name
    skill = Skill(
        path=Path("/path/to/my-skill"),
        metadata=SkillMetadata(
            name="my-skill",
            description="Test skill",
            raw={}
        ),
        body="Test content",
        has_scripts=False,
        has_references=False,
        has_assets=False,
        parse_errors=()
    )
    
    # Run check
    result = NameMatchesDirectoryCheck().run(skill)
    
    # Assertions
    assert result.passed
    assert result.check_id == "naming.matches-directory"

Testing with Fixtures

Use pytest fixtures for common test data:

import pytest
from pathlib import Path
from skill_lab.parsers.skill_parser import parse_skill

@pytest.fixture
def valid_skill_path(tmp_path: Path) -> Path:
    """Create a temporary valid skill."""
    skill_dir = tmp_path / "test-skill"
    skill_dir.mkdir()
    
    skill_md = skill_dir / "SKILL.md"
    skill_md.write_text("""
---
name: test-skill
description: A test skill
---

This is the skill body.
    """)
    
    return skill_dir

def test_parser_with_fixture(valid_skill_path: Path):
    """Test parser with a fixture."""
    skill = parse_skill(valid_skill_path)
    
    assert skill.metadata is not None
    assert skill.metadata.name == "test-skill"
    assert skill.body.strip() == "This is the skill body."

Testing Error Cases

Test both success and failure paths:

from skill_lab.checks.static.content import BodyNotEmptyCheck

def test_body_not_empty_pass():
    """Test that check passes with non-empty body."""
    skill = create_skill_with_body("Content here")
    result = BodyNotEmptyCheck().run(skill)
    assert result.passed

def test_body_not_empty_fail():
    """Test that check fails with empty body."""
    skill = create_skill_with_body("")
    result = BodyNotEmptyCheck().run(skill)
    assert not result.passed
    assert "empty" in result.message.lower()

def test_body_not_empty_whitespace_only():
    """Test that check fails with whitespace-only body."""
    skill = create_skill_with_body("   \n\t  ")
    result = BodyNotEmptyCheck().run(skill)
    assert not result.passed

Testing CLI Commands

Use Typer’s testing utilities:

from typer.testing import CliRunner
from skill_lab.cli import app

runner = CliRunner()

def test_evaluate_command():
    """Test the evaluate command."""
    result = runner.invoke(app, ["evaluate", "./tests/fixtures/skills/valid-skill"])
    
    assert result.exit_code == 0
    assert "Quality Score" in result.stdout

def test_validate_command_pass():
    """Test validate command with valid skill."""
    result = runner.invoke(app, ["validate", "./tests/fixtures/skills/valid-skill"])
    
    assert result.exit_code == 0
    assert "PASS" in result.stdout

def test_validate_command_fail():
    """Test validate command with invalid skill."""
    result = runner.invoke(app, ["validate", "./tests/fixtures/skills/invalid-skill"])
    
    assert result.exit_code == 1
    assert "FAIL" in result.stdout

CLI tests should use the CliRunner from Typer, not pytest’s capsys. This ensures proper isolation and error handling.

Testing Parsers

Test parser behavior with various input formats:

from skill_lab.parsers.skill_parser import parse_skill
from pathlib import Path

def test_parse_valid_skill(tmp_path: Path):
    """Test parsing a valid SKILL.md file."""
    skill_dir = tmp_path / "test-skill"
    skill_dir.mkdir()
    
    (skill_dir / "SKILL.md").write_text("""
---
name: test-skill
description: Test description
---

Content here.
    """)
    
    skill = parse_skill(skill_dir)
    
    assert skill.metadata is not None
    assert skill.metadata.name == "test-skill"
    assert skill.metadata.description == "Test description"
    assert "Content here" in skill.body

def test_parse_missing_frontmatter(tmp_path: Path):
    """Test parsing skill without frontmatter."""
    skill_dir = tmp_path / "test-skill"
    skill_dir.mkdir()
    
    (skill_dir / "SKILL.md").write_text("Just content, no frontmatter.")
    
    skill = parse_skill(skill_dir)
    
    assert skill.metadata is None
    assert "Just content" in skill.body

def test_parse_invalid_yaml(tmp_path: Path):
    """Test parsing skill with invalid YAML."""
    skill_dir = tmp_path / "test-skill"
    skill_dir.mkdir()
    
    (skill_dir / "SKILL.md").write_text("""
---
name: [invalid yaml
description: test
---

Content.
    """)
    
    skill = parse_skill(skill_dir)
    
    assert len(skill.parse_errors) > 0

Testing Trace Handlers

Test trace analysis handlers:

from skill_lab.tracechecks.handlers.command_presence import CommandPresenceHandler
from skill_lab.parsers.trace_parser import TraceAnalyzer, TraceEvent
from skill_lab.tracechecks.trace_check_loader import TraceCheckDefinition

def test_command_presence_handler():
    """Test command presence handler."""
    # Create trace events
    events = [
        TraceEvent(
            type="item.completed",
            item_type="command_execution",
            command="npm install",
            output="installed packages",
            timestamp="2024-01-01T00:00:00Z",
            raw={}
        )
    ]
    
    # Create analyzer
    analyzer = TraceAnalyzer(events)
    
    # Create check definition
    check = TraceCheckDefinition(
        id="test-check",
        type="command_presence",
        pattern="npm install"
    )
    
    # Run handler
    handler = CommandPresenceHandler()
    result = handler.run(check, analyzer, Path("/project"))
    
    # Assertions
    assert result.passed
    assert "npm install" in result.message

Pytest Configuration

The pyproject.toml file contains pytest configuration:

[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = "test_*.py"
python_functions = "test_*"
addopts = "-v --tb=short"
filterwarnings = [
    "ignore::DeprecationWarning",
]

Test Discovery

Test files: Must be named test_*.py
Test functions: Must be named test_*
Test classes: Must be named Test*
Test methods: Must be named test_*

Writing Effective Tests

Best Practices

Use descriptive names

Test names should clearly describe what they test:

# Good
def test_name_field_required_when_missing():
    ...

# Bad
def test_name():
    ...

Test one thing at a time

Each test should verify a single behavior:

# Good - separate tests
def test_check_passes_with_valid_name():
    ...

def test_check_fails_with_missing_name():
    ...

# Bad - testing multiple things
def test_name_validation():
    # Tests both valid and invalid cases
    ...

Use fixtures for common setup

Extract common setup into fixtures:

@pytest.fixture
def skill_with_metadata():
    return create_skill_with_metadata({
        "name": "test-skill",
        "description": "Test description"
    })

def test_with_fixture(skill_with_metadata):
    result = SomeCheck().run(skill_with_metadata)
    assert result.passed

Test edge cases

Don’t just test the happy path:

def test_empty_string():
    ...

def test_whitespace_only():
    ...

def test_special_characters():
    ...

def test_unicode():
    ...

def test_very_long_input():
    ...

Use assertions effectively

Make assertions specific and clear:

# Good
assert result.passed
assert result.check_id == "schema.name-exists"
assert "name" in result.message.lower()

# Bad
assert result  # Too vague

Common Patterns

Parametrized Tests

Test multiple cases with the same logic:

import pytest

@pytest.mark.parametrize(
    "name,expected_valid",
    [
        ("valid-name", True),
        ("valid_name", True),
        ("Invalid Name", False),
        ("invalid-name!", False),
        ("", False),
    ]
)
def test_name_validation(name: str, expected_valid: bool):
    skill = create_skill_with_name(name)
    result = NameValidationCheck().run(skill)
    assert result.passed == expected_valid

Testing Exceptions

Test that exceptions are raised correctly:

import pytest
from skill_lab.core.exceptions import ParseError

def test_parser_raises_on_invalid_yaml():
    with pytest.raises(ParseError) as exc_info:
        parse_invalid_yaml("[invalid yaml")
    
    assert "YAML" in str(exc_info.value)
    assert exc_info.value.context is not None

Mocking External Dependencies

Use unittest.mock for external dependencies:

from unittest.mock import Mock, patch

@patch("skill_lab.triggers.generator.Anthropic")
def test_generator_with_mock_llm(mock_anthropic):
    # Configure mock
    mock_client = Mock()
    mock_anthropic.return_value = mock_client
    mock_client.messages.create.return_value = Mock(
        content=[Mock(text="Generated test cases")]
    )
    
    # Test code
    generator = TriggerGenerator(api_key="test-key")
    result = generator.generate_tests(skill)
    
    # Assertions
    assert mock_client.messages.create.called

Continuous Integration

Tests are automatically run on CI for:

All pull requests
Commits to main branch
Release tags

CI Checks

The CI pipeline runs:

Unit tests - All tests in tests/
Coverage - Minimum 80% coverage required
Type checking - mypy src/ in strict mode
Linting - ruff check src/
Formatting - ruff format --check src/

All CI checks must pass before a PR can be merged. Run these checks locally to catch issues early.

Debugging Tests

Using pytest Debugging Features

# Drop into debugger on failure
pytest tests/ --pdb

# Show local variables on failure
pytest tests/ -l

# Show print statements
pytest tests/ -s

# Verbose output
pytest tests/ -vv

# Show slowest tests
pytest tests/ --durations=10

Using Python Debugger

Add breakpoints in test code:

def test_complex_logic():
    result = complex_function()
    
    # Drop into debugger here
    import pdb; pdb.set_trace()
    
    assert result.is_valid()

Test Coverage Goals

Overall: Maintain >80% coverage
Core modules: >90% coverage for core/, checks/, parsers/
CLI: >70% coverage (some paths are hard to test)
New code: All new features must have tests

Don’t chase 100% coverage. Focus on testing critical paths and edge cases rather than hitting arbitrary coverage numbers.

Get Started

Guides

Core Concepts

Development

Test Organization

Test Structure

Fixtures

Creating Test Fixtures

Running Tests

Basic Commands

Coverage Reports

Testing Patterns

Testing Static Checks

1. Schema-Based Checks (Registry Lookup)

2. Behavioral Checks (Direct Import)

Testing with Fixtures

Testing Error Cases

Testing CLI Commands

Testing Parsers

Testing Trace Handlers

Pytest Configuration

Test Discovery

Writing Effective Tests

Best Practices

Common Patterns

Parametrized Tests

Testing Exceptions

Mocking External Dependencies

Continuous Integration

CI Checks

Debugging Tests

Using pytest Debugging Features

Using Python Debugger

Test Coverage Goals

Build docs developers (and LLMs) love

Get Started

Guides

Core Concepts

Development

​Test Organization

​Test Structure

​Fixtures

​Creating Test Fixtures

​Running Tests

​Basic Commands

​Coverage Reports

​Testing Patterns

​Testing Static Checks

​1. Schema-Based Checks (Registry Lookup)

​2. Behavioral Checks (Direct Import)

​Testing with Fixtures

​Testing Error Cases

​Testing CLI Commands

​Testing Parsers

​Testing Trace Handlers

​Pytest Configuration

​Test Discovery

​Writing Effective Tests

​Best Practices

​Common Patterns

​Parametrized Tests

​Testing Exceptions

​Mocking External Dependencies

​Continuous Integration

​CI Checks

​Debugging Tests

​Using pytest Debugging Features

​Using Python Debugger

​Test Coverage Goals

Build docs developers (and LLMs) love

Test Organization

Test Structure

Fixtures

Creating Test Fixtures

Running Tests

Basic Commands

Coverage Reports

Testing Patterns

Testing Static Checks

1. Schema-Based Checks (Registry Lookup)

2. Behavioral Checks (Direct Import)

Testing with Fixtures

Testing Error Cases

Testing CLI Commands

Testing Parsers

Testing Trace Handlers

Pytest Configuration

Test Discovery

Writing Effective Tests

Best Practices

Common Patterns

Parametrized Tests

Testing Exceptions

Mocking External Dependencies

Continuous Integration

CI Checks

Debugging Tests

Using pytest Debugging Features

Using Python Debugger

Test Coverage Goals