Skip to main content
Skill Lab assigns a quality score from 0-100 to each skill based on the results of 28 static checks. This score reflects how well the skill follows best practices and the Agent Skills specification.

Score Calculation Formula

The quality score is a weighted average of dimension scores:
quality_score = (
    structure_score * 0.30 +
    naming_score * 0.20 +
    description_score * 0.25 +
    content_score * 0.25
)

Dimension Weights

From core/scoring.py:12-18:
DimensionWeightFocus Area
Structure30%File organization, folder structure, frontmatter validity
Naming20%Skill name format and directory matching
Description25%Description presence, length, and quality
Content25%Body content, examples, token budgets, references
Execution0%Evaluated separately via trace analysis (Phase 3)
The Execution dimension is reserved for trace-based checks that analyze runtime behavior. It does not contribute to the static analysis score.

Dimension Score Calculation

Each dimension score is calculated using severity-weighted results:
def calculate_dimension_score(results: list[CheckResult]) -> float:
    if not results:
        return 100.0
    
    total_weight = sum(SEVERITY_WEIGHTS[r.severity] for r in results)
    passed_weight = sum(SEVERITY_WEIGHTS[r.severity] for r in results if r.passed)
    
    if total_weight == 0:
        return 100.0
    
    return (passed_weight / total_weight) * 100

Severity Weights

From core/scoring.py:21-25:
SeverityWeightImpact
ERROR1.0Must fix - spec requirement or critical issue
WARNING0.5Should fix - best practice recommendation
INFO0.25Optional - quality suggestion
A single failed ERROR check has the same impact as two failed WARNING checks or four failed INFO checks.

Example Calculation

Let’s calculate the score for a skill with the following results in the Structure dimension:
CheckSeverityPassed
SKILL.md ExistsERROR (1.0)
Valid FrontmatterERROR (1.0)
Scripts ValidWARNING (0.5)
Standard FieldsWARNING (0.5)
Calculation:
total_weight = 1.0 + 1.0 + 0.5 + 0.5 = 3.0
passed_weight = 1.0 + 1.0 + 0.0 + 0.5 = 2.5

structure_score = (2.5 / 3.0) * 100 = 83.33
If all other dimensions score 100%, the final quality score would be:
quality_score = (83.33 * 0.30) + (100 * 0.20) + (100 * 0.25) + (100 * 0.25)
              = 25.00 + 20.00 + 25.00 + 25.00
              = 95.00

Check Distribution

The 28 static checks are distributed across dimensions:

Structure (7 checks)

  • SKILL.md Exists (ERROR, spec-required)
  • Valid Frontmatter (ERROR, spec-required)
  • Standard Frontmatter Fields (WARNING)
  • Scripts Valid (WARNING)
  • References Valid (WARNING)
  • Scripts No Interactive Input (WARNING)
  • Scripts Self-Contained (INFO)

Naming (1 check)

  • Name Matches Directory (ERROR, spec-required)

Description (0 behavioral checks)

All description checks are schema-based

Content (11 checks)

  • Body Not Empty (WARNING)
  • Line Budget (WARNING)
  • Has Examples (INFO)
  • Reference Depth (WARNING)
  • Scripts Referenced (WARNING)
  • Script Paths Exist (WARNING)
  • Compatibility Prerequisites (INFO)
  • Token Budget (WARNING)
  • Metadata Token Budget (INFO)
  • Description Actionable (INFO)
  • Asset Paths Exist (WARNING)

Schema-Based Checks (9 checks)

  • Name Required (ERROR, spec-required)
  • Name Format (ERROR, spec-required)
  • Description Required (ERROR, spec-required)
  • Description Not Empty (ERROR, spec-required)
  • Description Max Length (ERROR, spec-required)
  • Compatibility Length (ERROR, spec-required)
  • Metadata Format (ERROR, spec-required)
  • License Format (WARNING)
  • Allowed Tools Format (WARNING)
Spec-required checks (10 total) must pass for the skill to be considered spec-compliant. The remaining 18 checks are quality suggestions.

Spec Compliance vs. Quality

Skill Lab distinguishes between:
  1. Spec Compliance - Does the skill meet the minimum requirements of the Agent Skills specification?
  2. Quality Score - How well does the skill follow best practices?
A skill can be spec-compliant (all 10 spec-required checks pass) but still have a lower quality score if it fails quality suggestions.

Spec-Only Evaluation

To check only spec-required constraints:
sklab evaluate ./my-skill -s
This runs only the 10 spec-required checks and ignores quality suggestions.

Validation Modes

CommandChecks RunOutput
sklab evaluateAll 28 checksDetailed report with quality score
sklab evaluate -s10 spec-required checks onlySpec compliance report
sklab validateAll 28 checksQuick pass/fail (exit code 0 or 1)

Score Interpretation

Score RangeInterpretation
90-100Excellent - follows best practices
75-89Good - minor improvements recommended
60-74Fair - several quality issues
Below 60Poor - significant issues to address
A skill with a score below 100% may still be functional and spec-compliant. The score reflects adherence to best practices, not correctness.

JSON Output

For machine-readable output, use the --json flag:
sklab evaluate ./my-skill --json
The JSON output includes:
  • quality_score - Overall score (0-100)
  • overall_pass - Boolean indicating if all checks passed
  • checks_run, checks_passed, checks_failed - Count summaries
  • results - Array of individual check results
  • summary - Breakdown by severity and dimension

Build docs developers (and LLMs) love