Skill Lab assigns a quality score from 0-100 to each skill based on the results of 28 static checks. This score reflects how well the skill follows best practices and the Agent Skills specification.
The quality score is a weighted average of dimension scores:
quality_score = (
structure_score * 0.30 +
naming_score * 0.20 +
description_score * 0.25 +
content_score * 0.25
)
Dimension Weights
From core/scoring.py:12-18:
| Dimension | Weight | Focus Area |
|---|
| Structure | 30% | File organization, folder structure, frontmatter validity |
| Naming | 20% | Skill name format and directory matching |
| Description | 25% | Description presence, length, and quality |
| Content | 25% | Body content, examples, token budgets, references |
| Execution | 0% | Evaluated separately via trace analysis (Phase 3) |
The Execution dimension is reserved for trace-based checks that analyze runtime behavior. It does not contribute to the static analysis score.
Dimension Score Calculation
Each dimension score is calculated using severity-weighted results:
def calculate_dimension_score(results: list[CheckResult]) -> float:
if not results:
return 100.0
total_weight = sum(SEVERITY_WEIGHTS[r.severity] for r in results)
passed_weight = sum(SEVERITY_WEIGHTS[r.severity] for r in results if r.passed)
if total_weight == 0:
return 100.0
return (passed_weight / total_weight) * 100
Severity Weights
From core/scoring.py:21-25:
| Severity | Weight | Impact |
|---|
| ERROR | 1.0 | Must fix - spec requirement or critical issue |
| WARNING | 0.5 | Should fix - best practice recommendation |
| INFO | 0.25 | Optional - quality suggestion |
A single failed ERROR check has the same impact as two failed WARNING checks or four failed INFO checks.
Example Calculation
Let’s calculate the score for a skill with the following results in the Structure dimension:
| Check | Severity | Passed |
|---|
| SKILL.md Exists | ERROR (1.0) | ✅ |
| Valid Frontmatter | ERROR (1.0) | ✅ |
| Scripts Valid | WARNING (0.5) | ❌ |
| Standard Fields | WARNING (0.5) | ✅ |
Calculation:
total_weight = 1.0 + 1.0 + 0.5 + 0.5 = 3.0
passed_weight = 1.0 + 1.0 + 0.0 + 0.5 = 2.5
structure_score = (2.5 / 3.0) * 100 = 83.33
If all other dimensions score 100%, the final quality score would be:
quality_score = (83.33 * 0.30) + (100 * 0.20) + (100 * 0.25) + (100 * 0.25)
= 25.00 + 20.00 + 25.00 + 25.00
= 95.00
Check Distribution
The 28 static checks are distributed across dimensions:
Structure (7 checks)
- SKILL.md Exists (ERROR, spec-required)
- Valid Frontmatter (ERROR, spec-required)
- Standard Frontmatter Fields (WARNING)
- Scripts Valid (WARNING)
- References Valid (WARNING)
- Scripts No Interactive Input (WARNING)
- Scripts Self-Contained (INFO)
Naming (1 check)
- Name Matches Directory (ERROR, spec-required)
Description (0 behavioral checks)
All description checks are schema-based
Content (11 checks)
- Body Not Empty (WARNING)
- Line Budget (WARNING)
- Has Examples (INFO)
- Reference Depth (WARNING)
- Scripts Referenced (WARNING)
- Script Paths Exist (WARNING)
- Compatibility Prerequisites (INFO)
- Token Budget (WARNING)
- Metadata Token Budget (INFO)
- Description Actionable (INFO)
- Asset Paths Exist (WARNING)
Schema-Based Checks (9 checks)
- Name Required (ERROR, spec-required)
- Name Format (ERROR, spec-required)
- Description Required (ERROR, spec-required)
- Description Not Empty (ERROR, spec-required)
- Description Max Length (ERROR, spec-required)
- Compatibility Length (ERROR, spec-required)
- Metadata Format (ERROR, spec-required)
- License Format (WARNING)
- Allowed Tools Format (WARNING)
Spec-required checks (10 total) must pass for the skill to be considered spec-compliant. The remaining 18 checks are quality suggestions.
Spec Compliance vs. Quality
Skill Lab distinguishes between:
- Spec Compliance - Does the skill meet the minimum requirements of the Agent Skills specification?
- Quality Score - How well does the skill follow best practices?
A skill can be spec-compliant (all 10 spec-required checks pass) but still have a lower quality score if it fails quality suggestions.
Spec-Only Evaluation
To check only spec-required constraints:
sklab evaluate ./my-skill -s
This runs only the 10 spec-required checks and ignores quality suggestions.
Validation Modes
| Command | Checks Run | Output |
|---|
sklab evaluate | All 28 checks | Detailed report with quality score |
sklab evaluate -s | 10 spec-required checks only | Spec compliance report |
sklab validate | All 28 checks | Quick pass/fail (exit code 0 or 1) |
Score Interpretation
| Score Range | Interpretation |
|---|
| 90-100 | Excellent - follows best practices |
| 75-89 | Good - minor improvements recommended |
| 60-74 | Fair - several quality issues |
| Below 60 | Poor - significant issues to address |
A skill with a score below 100% may still be functional and spec-compliant. The score reflects adherence to best practices, not correctness.
JSON Output
For machine-readable output, use the --json flag:
sklab evaluate ./my-skill --json
The JSON output includes:
quality_score - Overall score (0-100)
overall_pass - Boolean indicating if all checks passed
checks_run, checks_passed, checks_failed - Count summaries
results - Array of individual check results
summary - Breakdown by severity and dimension