Quality Scoring

Skill Lab assigns a quality score from 0-100 to each skill based on the results of 28 static checks. This score reflects how well the skill follows best practices and the Agent Skills specification.

Score Calculation Formula

The quality score is a weighted average of dimension scores:

quality_score = (
    structure_score * 0.30 +
    naming_score * 0.20 +
    description_score * 0.25 +
    content_score * 0.25
)

Dimension Weights

From core/scoring.py:12-18:

Dimension	Weight	Focus Area
Structure	30%	File organization, folder structure, frontmatter validity
Naming	20%	Skill name format and directory matching
Description	25%	Description presence, length, and quality
Content	25%	Body content, examples, token budgets, references
Execution	0%	Evaluated separately via trace analysis (Phase 3)

The Execution dimension is reserved for trace-based checks that analyze runtime behavior. It does not contribute to the static analysis score.

Dimension Score Calculation

Each dimension score is calculated using severity-weighted results:

def calculate_dimension_score(results: list[CheckResult]) -> float:
    if not results:
        return 100.0
    
    total_weight = sum(SEVERITY_WEIGHTS[r.severity] for r in results)
    passed_weight = sum(SEVERITY_WEIGHTS[r.severity] for r in results if r.passed)
    
    if total_weight == 0:
        return 100.0
    
    return (passed_weight / total_weight) * 100

Severity Weights

From core/scoring.py:21-25:

Severity	Weight	Impact
ERROR	1.0	Must fix - spec requirement or critical issue
WARNING	0.5	Should fix - best practice recommendation
INFO	0.25	Optional - quality suggestion

A single failed ERROR check has the same impact as two failed WARNING checks or four failed INFO checks.

Example Calculation

Let’s calculate the score for a skill with the following results in the Structure dimension:

Check	Severity	Passed
SKILL.md Exists	ERROR (1.0)	✅
Valid Frontmatter	ERROR (1.0)	✅
Scripts Valid	WARNING (0.5)	❌
Standard Fields	WARNING (0.5)	✅

Calculation:

total_weight = 1.0 + 1.0 + 0.5 + 0.5 = 3.0
passed_weight = 1.0 + 1.0 + 0.0 + 0.5 = 2.5

structure_score = (2.5 / 3.0) * 100 = 83.33

If all other dimensions score 100%, the final quality score would be:

quality_score = (83.33 * 0.30) + (100 * 0.20) + (100 * 0.25) + (100 * 0.25)
              = 25.00 + 20.00 + 25.00 + 25.00
              = 95.00

Check Distribution

The 28 static checks are distributed across dimensions:

Structure (7 checks)

SKILL.md Exists (ERROR, spec-required)
Valid Frontmatter (ERROR, spec-required)
Standard Frontmatter Fields (WARNING)
Scripts Valid (WARNING)
References Valid (WARNING)
Scripts No Interactive Input (WARNING)
Scripts Self-Contained (INFO)

Naming (1 check)

Name Matches Directory (ERROR, spec-required)

Description (0 behavioral checks)

All description checks are schema-based

Content (11 checks)

Body Not Empty (WARNING)
Line Budget (WARNING)
Has Examples (INFO)
Reference Depth (WARNING)
Scripts Referenced (WARNING)
Script Paths Exist (WARNING)
Compatibility Prerequisites (INFO)
Token Budget (WARNING)
Metadata Token Budget (INFO)
Description Actionable (INFO)
Asset Paths Exist (WARNING)

Schema-Based Checks (9 checks)

Name Required (ERROR, spec-required)
Name Format (ERROR, spec-required)
Description Required (ERROR, spec-required)
Description Not Empty (ERROR, spec-required)
Description Max Length (ERROR, spec-required)
Compatibility Length (ERROR, spec-required)
Metadata Format (ERROR, spec-required)
License Format (WARNING)
Allowed Tools Format (WARNING)

Spec-required checks (10 total) must pass for the skill to be considered spec-compliant. The remaining 18 checks are quality suggestions.

Spec Compliance vs. Quality

Skill Lab distinguishes between:

Spec Compliance - Does the skill meet the minimum requirements of the Agent Skills specification?
Quality Score - How well does the skill follow best practices?

A skill can be spec-compliant (all 10 spec-required checks pass) but still have a lower quality score if it fails quality suggestions.

Spec-Only Evaluation

To check only spec-required constraints:

sklab evaluate ./my-skill -s

This runs only the 10 spec-required checks and ignores quality suggestions.

Validation Modes

Command	Checks Run	Output
`sklab evaluate`	All 28 checks	Detailed report with quality score
`sklab evaluate -s`	10 spec-required checks only	Spec compliance report
`sklab validate`	All 28 checks	Quick pass/fail (exit code 0 or 1)

Score Interpretation

Score Range	Interpretation
90-100	Excellent - follows best practices
75-89	Good - minor improvements recommended
60-74	Fair - several quality issues
Below 60	Poor - significant issues to address

A skill with a score below 100% may still be functional and spec-compliant. The score reflects adherence to best practices, not correctness.

JSON Output

For machine-readable output, use the --json flag:

sklab evaluate ./my-skill --json

The JSON output includes:

quality_score - Overall score (0-100)
overall_pass - Boolean indicating if all checks passed
checks_run, checks_passed, checks_failed - Count summaries
results - Array of individual check results
summary - Breakdown by severity and dimension

Get Started

Guides

Core Concepts

Development

Score Calculation Formula

Dimension Weights

Dimension Score Calculation

Severity Weights

Example Calculation

Check Distribution

Structure (7 checks)

Naming (1 check)

Description (0 behavioral checks)

Content (11 checks)

Schema-Based Checks (9 checks)

Spec Compliance vs. Quality

Spec-Only Evaluation

Validation Modes

Score Interpretation

JSON Output

Build docs developers (and LLMs) love

Get Started

Guides

Core Concepts

Development

​Score Calculation Formula

​Dimension Weights

​Dimension Score Calculation

​Severity Weights

​Example Calculation

​Check Distribution

​Structure (7 checks)

​Naming (1 check)

​Description (0 behavioral checks)

​Content (11 checks)

​Schema-Based Checks (9 checks)

​Spec Compliance vs. Quality

​Spec-Only Evaluation

​Validation Modes

​Score Interpretation

​JSON Output

Build docs developers (and LLMs) love

Score Calculation Formula

Dimension Weights

Dimension Score Calculation

Severity Weights

Example Calculation

Check Distribution

Structure (7 checks)

Naming (1 check)

Description (0 behavioral checks)

Content (11 checks)

Schema-Based Checks (9 checks)

Spec Compliance vs. Quality

Spec-Only Evaluation

Validation Modes

Score Interpretation

JSON Output