This guide covers testing practices, conventions, and patterns used in Skill Lab.
Test Organization
Test Structure
Tests are organized in the tests/ directory, mirroring the source code structure:
tests/
├── __init__.py
├── conftest.py # Shared fixtures and configuration
├── test_checks.py # Static check tests (28 checks)
├── test_cli.py # CLI command tests
├── test_evaluator.py # Evaluator tests
├── test_generator.py # LLM-based test generation
├── test_parsers.py # Parser tests
├── test_prompt_exporter.py # Prompt export tests
├── test_scoring.py # Scoring algorithm tests
├── test_tokens.py # Token estimation tests
├── test_trace_check_loader.py # Trace check loader tests
├── test_trace_evaluator.py # Trace evaluator tests
├── test_trace_handlers.py # Trace handler tests
├── test_triggers.py # Trigger testing tests
└── fixtures/
└── skills/ # Mock skill directories
├── valid-skill/
├── missing-frontmatter/
└── invalid-yaml/
Fixtures
Test fixtures are organized in tests/fixtures/skills/. Each subdirectory represents a complete skill with a SKILL.md file and optional supporting files.
Creating Test Fixtures
# tests/conftest.py
import pytest
from pathlib import Path
@pytest.fixture
def fixtures_dir() -> Path:
"""Return path to fixtures directory."""
return Path(__file__).parent / "fixtures" / "skills"
@pytest.fixture
def valid_skill(fixtures_dir: Path) -> Path:
"""Return path to a valid skill fixture."""
return fixtures_dir / "valid-skill"
Running Tests
Basic Commands
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=skill_lab --cov-report=html
# Run specific test file
pytest tests/test_checks.py -v
# Run specific test function
pytest tests/test_checks.py::test_skill_md_exists -v
# Run tests matching a pattern
pytest tests/ -k "schema" -v
Coverage Reports
Generate and view coverage reports:
# Generate HTML coverage report
pytest tests/ --cov=skill_lab --cov-report=html
# Open the report in your browser
open htmlcov/index.html # macOS
xdg-open htmlcov/index.html # Linux
start htmlcov/index.html # Windows
The coverage configuration in pyproject.toml excludes common patterns like __repr__, if TYPE_CHECKING:, and if __name__ == "__main__":.
Testing Patterns
Testing Static Checks
There are two patterns for testing static checks, depending on the check type:
1. Schema-Based Checks (Registry Lookup)
For checks defined in schema.py using FieldRule:
from skill_lab.core.registry import registry
def _get_check(check_id: str):
"""Helper to get a check class by ID from the registry."""
check_class = registry.get(check_id)
if check_class is None:
raise ValueError(f"Check not found: {check_id}")
return check_class
def test_schema_check():
# Get check from registry
check_class = _get_check("schema.name-exists")
# Create skill fixture
skill = create_skill_with_metadata({"description": "Test"})
# Run check
result = check_class().run(skill)
# Assertions
assert not result.passed
assert "name" in result.message.lower()
2. Behavioral Checks (Direct Import)
For checks with custom implementation:
from skill_lab.checks.static.naming import NameMatchesDirectoryCheck
from skill_lab.core.models import Skill, SkillMetadata
from pathlib import Path
def test_name_matches_directory():
# Create skill with matching name
skill = Skill(
path=Path("/path/to/my-skill"),
metadata=SkillMetadata(
name="my-skill",
description="Test skill",
raw={}
),
body="Test content",
has_scripts=False,
has_references=False,
has_assets=False,
parse_errors=()
)
# Run check
result = NameMatchesDirectoryCheck().run(skill)
# Assertions
assert result.passed
assert result.check_id == "naming.matches-directory"
Testing with Fixtures
Use pytest fixtures for common test data:
import pytest
from pathlib import Path
from skill_lab.parsers.skill_parser import parse_skill
@pytest.fixture
def valid_skill_path(tmp_path: Path) -> Path:
"""Create a temporary valid skill."""
skill_dir = tmp_path / "test-skill"
skill_dir.mkdir()
skill_md = skill_dir / "SKILL.md"
skill_md.write_text("""
---
name: test-skill
description: A test skill
---
This is the skill body.
""")
return skill_dir
def test_parser_with_fixture(valid_skill_path: Path):
"""Test parser with a fixture."""
skill = parse_skill(valid_skill_path)
assert skill.metadata is not None
assert skill.metadata.name == "test-skill"
assert skill.body.strip() == "This is the skill body."
Testing Error Cases
Test both success and failure paths:
from skill_lab.checks.static.content import BodyNotEmptyCheck
def test_body_not_empty_pass():
"""Test that check passes with non-empty body."""
skill = create_skill_with_body("Content here")
result = BodyNotEmptyCheck().run(skill)
assert result.passed
def test_body_not_empty_fail():
"""Test that check fails with empty body."""
skill = create_skill_with_body("")
result = BodyNotEmptyCheck().run(skill)
assert not result.passed
assert "empty" in result.message.lower()
def test_body_not_empty_whitespace_only():
"""Test that check fails with whitespace-only body."""
skill = create_skill_with_body(" \n\t ")
result = BodyNotEmptyCheck().run(skill)
assert not result.passed
Testing CLI Commands
Use Typer’s testing utilities:
from typer.testing import CliRunner
from skill_lab.cli import app
runner = CliRunner()
def test_evaluate_command():
"""Test the evaluate command."""
result = runner.invoke(app, ["evaluate", "./tests/fixtures/skills/valid-skill"])
assert result.exit_code == 0
assert "Quality Score" in result.stdout
def test_validate_command_pass():
"""Test validate command with valid skill."""
result = runner.invoke(app, ["validate", "./tests/fixtures/skills/valid-skill"])
assert result.exit_code == 0
assert "PASS" in result.stdout
def test_validate_command_fail():
"""Test validate command with invalid skill."""
result = runner.invoke(app, ["validate", "./tests/fixtures/skills/invalid-skill"])
assert result.exit_code == 1
assert "FAIL" in result.stdout
CLI tests should use the CliRunner from Typer, not pytest’s capsys. This ensures proper isolation and error handling.
Testing Parsers
Test parser behavior with various input formats:
from skill_lab.parsers.skill_parser import parse_skill
from pathlib import Path
def test_parse_valid_skill(tmp_path: Path):
"""Test parsing a valid SKILL.md file."""
skill_dir = tmp_path / "test-skill"
skill_dir.mkdir()
(skill_dir / "SKILL.md").write_text("""
---
name: test-skill
description: Test description
---
Content here.
""")
skill = parse_skill(skill_dir)
assert skill.metadata is not None
assert skill.metadata.name == "test-skill"
assert skill.metadata.description == "Test description"
assert "Content here" in skill.body
def test_parse_missing_frontmatter(tmp_path: Path):
"""Test parsing skill without frontmatter."""
skill_dir = tmp_path / "test-skill"
skill_dir.mkdir()
(skill_dir / "SKILL.md").write_text("Just content, no frontmatter.")
skill = parse_skill(skill_dir)
assert skill.metadata is None
assert "Just content" in skill.body
def test_parse_invalid_yaml(tmp_path: Path):
"""Test parsing skill with invalid YAML."""
skill_dir = tmp_path / "test-skill"
skill_dir.mkdir()
(skill_dir / "SKILL.md").write_text("""
---
name: [invalid yaml
description: test
---
Content.
""")
skill = parse_skill(skill_dir)
assert len(skill.parse_errors) > 0
Testing Trace Handlers
Test trace analysis handlers:
from skill_lab.tracechecks.handlers.command_presence import CommandPresenceHandler
from skill_lab.parsers.trace_parser import TraceAnalyzer, TraceEvent
from skill_lab.tracechecks.trace_check_loader import TraceCheckDefinition
def test_command_presence_handler():
"""Test command presence handler."""
# Create trace events
events = [
TraceEvent(
type="item.completed",
item_type="command_execution",
command="npm install",
output="installed packages",
timestamp="2024-01-01T00:00:00Z",
raw={}
)
]
# Create analyzer
analyzer = TraceAnalyzer(events)
# Create check definition
check = TraceCheckDefinition(
id="test-check",
type="command_presence",
pattern="npm install"
)
# Run handler
handler = CommandPresenceHandler()
result = handler.run(check, analyzer, Path("/project"))
# Assertions
assert result.passed
assert "npm install" in result.message
Pytest Configuration
The pyproject.toml file contains pytest configuration:
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = "test_*.py"
python_functions = "test_*"
addopts = "-v --tb=short"
filterwarnings = [
"ignore::DeprecationWarning",
]
Test Discovery
- Test files: Must be named
test_*.py
- Test functions: Must be named
test_*
- Test classes: Must be named
Test*
- Test methods: Must be named
test_*
Writing Effective Tests
Best Practices
Use descriptive names
Test names should clearly describe what they test:# Good
def test_name_field_required_when_missing():
...
# Bad
def test_name():
...
Test one thing at a time
Each test should verify a single behavior:# Good - separate tests
def test_check_passes_with_valid_name():
...
def test_check_fails_with_missing_name():
...
# Bad - testing multiple things
def test_name_validation():
# Tests both valid and invalid cases
...
Use fixtures for common setup
Extract common setup into fixtures:@pytest.fixture
def skill_with_metadata():
return create_skill_with_metadata({
"name": "test-skill",
"description": "Test description"
})
def test_with_fixture(skill_with_metadata):
result = SomeCheck().run(skill_with_metadata)
assert result.passed
Test edge cases
Don’t just test the happy path:def test_empty_string():
...
def test_whitespace_only():
...
def test_special_characters():
...
def test_unicode():
...
def test_very_long_input():
...
Use assertions effectively
Make assertions specific and clear:# Good
assert result.passed
assert result.check_id == "schema.name-exists"
assert "name" in result.message.lower()
# Bad
assert result # Too vague
Common Patterns
Parametrized Tests
Test multiple cases with the same logic:
import pytest
@pytest.mark.parametrize(
"name,expected_valid",
[
("valid-name", True),
("valid_name", True),
("Invalid Name", False),
("invalid-name!", False),
("", False),
]
)
def test_name_validation(name: str, expected_valid: bool):
skill = create_skill_with_name(name)
result = NameValidationCheck().run(skill)
assert result.passed == expected_valid
Testing Exceptions
Test that exceptions are raised correctly:
import pytest
from skill_lab.core.exceptions import ParseError
def test_parser_raises_on_invalid_yaml():
with pytest.raises(ParseError) as exc_info:
parse_invalid_yaml("[invalid yaml")
assert "YAML" in str(exc_info.value)
assert exc_info.value.context is not None
Mocking External Dependencies
Use unittest.mock for external dependencies:
from unittest.mock import Mock, patch
@patch("skill_lab.triggers.generator.Anthropic")
def test_generator_with_mock_llm(mock_anthropic):
# Configure mock
mock_client = Mock()
mock_anthropic.return_value = mock_client
mock_client.messages.create.return_value = Mock(
content=[Mock(text="Generated test cases")]
)
# Test code
generator = TriggerGenerator(api_key="test-key")
result = generator.generate_tests(skill)
# Assertions
assert mock_client.messages.create.called
Continuous Integration
Tests are automatically run on CI for:
- All pull requests
- Commits to main branch
- Release tags
CI Checks
The CI pipeline runs:
- Unit tests - All tests in
tests/
- Coverage - Minimum 80% coverage required
- Type checking -
mypy src/ in strict mode
- Linting -
ruff check src/
- Formatting -
ruff format --check src/
All CI checks must pass before a PR can be merged. Run these checks locally to catch issues early.
Debugging Tests
Using pytest Debugging Features
# Drop into debugger on failure
pytest tests/ --pdb
# Show local variables on failure
pytest tests/ -l
# Show print statements
pytest tests/ -s
# Verbose output
pytest tests/ -vv
# Show slowest tests
pytest tests/ --durations=10
Using Python Debugger
Add breakpoints in test code:
def test_complex_logic():
result = complex_function()
# Drop into debugger here
import pdb; pdb.set_trace()
assert result.is_valid()
Test Coverage Goals
- Overall: Maintain >80% coverage
- Core modules: >90% coverage for
core/, checks/, parsers/
- CLI: >70% coverage (some paths are hard to test)
- New code: All new features must have tests
Don’t chase 100% coverage. Focus on testing critical paths and edge cases rather than hitting arbitrary coverage numbers.