Usage Examples - Trustworthy Model Registry

Quick Start Example

Here’s a complete workflow from installation to model evaluation:

# Clone the repository
git clone https://github.com/GingerlyData247/SOTeam4-P2.git
cd SOTeam4-P2

# Make executable and install dependencies
chmod +x run.py
./run.py install

# Verify installation
./run.py test

# Create a URL file
echo "https://huggingface.co/facebook/wav2vec2-base" > urls.txt

# Run evaluation
./run.py urls.txt

Evaluating a Single Model

Create URL File

The CLI requires a text file containing URLs. For a single model:

cat > model.txt << 'EOF'
https://huggingface.co/openai/whisper-tiny
EOF

Run Evaluation

./run.py model.txt

Example Output

{"name":"openai/whisper-tiny","category":"MODEL","reproducibility":0.9,"reproducibility_latency":1243,"license":0.95,"license_latency":876,"size_score":{"raspberry_pi":0.6,"jetson_nano":0.7,"desktop_pc":0.95,"aws_server":1.0},"size_score_latency":543,"lineage":0.85,"lineage_latency":1567,"reviewedness":0.92,"reviewedness_latency":2341,"bus_factor":0.0,"bus_factor_latency":0,"code_quality":0.0,"code_quality_latency":0,"net_score":0.7456,"net_score_latency":6570}

Metrics like bus_factor and code_quality may return 0 if no GitHub repository is linked to the model.

Batch Evaluation of Multiple Models

Create Multi-Model URL File

You can list multiple URLs, one per line:

cat > batch_models.txt << 'EOF'
https://huggingface.co/facebook/wav2vec2-base
https://huggingface.co/openai/whisper-tiny
https://huggingface.co/bert-base-uncased
https://huggingface.co/distilbert-base-uncased
https://huggingface.co/microsoft/deberta-v3-base
EOF

Run Batch Evaluation

./run.py batch_models.txt > results.ndjson

Process Results

The output is in NDJSON format (newline-delimited JSON), where each line is a complete JSON object:

# Count total models evaluated
wc -l results.ndjson

# Extract just the model names and net scores
jq -r '[.name, .net_score] | @tsv' results.ndjson

# Find models with net_score > 0.8
jq 'select(.net_score > 0.8)' results.ndjson

# Calculate average net_score
jq -s 'map(.net_score) | add / length' results.ndjson

Comma-Separated URLs

The CLI also supports comma-separated URLs on the same line:

cat > urls_comma.txt << 'EOF'
https://huggingface.co/facebook/wav2vec2-base, https://huggingface.co/openai/whisper-tiny
https://huggingface.co/bert-base-uncased, https://huggingface.co/distilbert-base-uncased
EOF

./run.py urls_comma.txt

Working with Different Model Sources

Hugging Face Models

cat > hf_models.txt << 'EOF'
https://huggingface.co/facebook/bart-large
https://huggingface.co/google/flan-t5-base
https://huggingface.co/EleutherAI/gpt-neo-1.3B
EOF

./run.py hf_models.txt

GitHub Repositories

You can also evaluate code repositories (they’ll be classified as CODE):

cat > github_repos.txt << 'EOF'
https://github.com/pytorch/fairseq
https://github.com/huggingface/transformers
EOF

./run.py github_repos.txt

GitHub repositories are classified as CODE, not MODEL, and will have limited metric evaluation.

Hugging Face Datasets

cat > datasets.txt << 'EOF'
https://huggingface.co/datasets/squad
https://huggingface.co/datasets/common_voice
EOF

./run.py datasets.txt

Dataset evaluation is currently limited in Phase 1. The primary focus is on MODEL resources.

Using Environment Variables

Enable Debug Logging

LOG_LEVEL=2 ./run.py urls.txt

This will output detailed debug information to stderr:

2026-03-04 14:23:45 INFO → Running metric: reproducibility for facebook/wav2vec2-base
2026-03-04 14:23:46 INFO ✓ Finished metric: reproducibility (1243 ms)
2026-03-04 14:23:46 INFO → Running metric: license for facebook/wav2vec2-base
...

Log to File

LOG_FILE=/var/log/model-eval.log LOG_LEVEL=1 ./run.py urls.txt

Now check the log file:

tail -f /var/log/model-eval.log

Disable Progress Bars

HF_HUB_DISABLE_PROGRESS_BARS=1 TQDM_DISABLE=1 ./run.py urls.txt

Advanced Workflows

Filter High-Quality Models

Evaluate a batch of models and filter for high net_score:

./run.py batch_models.txt | jq 'select(.net_score > 0.8)' > high_quality.ndjson

Generate CSV Report

Convert NDJSON output to CSV:

./run.py batch_models.txt | \
  jq -r '[.name, .net_score, .reproducibility, .license, .reviewedness] | @csv' > report.csv

Sort by Net Score

Evaluate and sort models by trustworthiness:

./run.py batch_models.txt | \
  jq -s 'sort_by(-.net_score)[] | {name, net_score}'

Extract Specific Metrics

Get only reproducibility and license scores:

./run.py urls.txt | jq '{name, reproducibility, license}'

Understanding Output Fields

Each output JSON object contains:

Core Fields

name

string

Model identifier (e.g., facebook/wav2vec2-base)

Metric Scores

Each metric has two fields:

{metric}

float

Score from 0.0 (worst) to 1.0 (best)

{metric}_latency

integer

Time in milliseconds to compute this metric

Available Metrics

reproducibility - Can the model be reproduced?
license - Is licensing clear and permissive?
size_score - Hardware compatibility (object with raspberry_pi, jetson_nano, desktop_pc, aws_server)
lineage - Are dependencies and data sources documented?
reviewedness - Quality of documentation and examples
bus_factor - Contributor diversity (requires GitHub repo)
code_quality - CI/CD and testing infrastructure (requires GitHub repo)

Performance Tuning

Parallel Processing

The CLI automatically processes models in parallel using up to 8 workers. For very large batches, you can monitor progress:

LOG_LEVEL=1 ./run.py large_batch.txt 2>&1 | grep "Running metric"

Timeouts

Each metric has a 90-second timeout. If a metric hangs, it returns 0.0 and processing continues:

2026-03-04 14:25:30 WARNING metric:bus_factor timed out after 90s.

Memory Usage

For very large batches, process in chunks:

# Split into chunks of 10 URLs each
split -l 10 large_urls.txt chunk_

# Process each chunk
for chunk in chunk_*; do
  ./run.py $chunk >> all_results.ndjson
done

Troubleshooting Examples

No Output Produced

Problem: Running ./run.py urls.txt produces no output. Solution: Check that the URL file exists and contains valid URLs:

cat urls.txt

Verify the file path is correct:

ls -la urls.txt

Metrics Return All Zeros

Problem: All metrics show 0.0 scores. Solution: Enable debug logging to see what’s failing:

LOG_LEVEL=2 ./run.py urls.txt 2>&1 | less

Check if the model URL is accessible:

curl -I https://huggingface.co/your/model

Slow Evaluation

Problem: Evaluation takes a very long time. Solution: Some metrics (especially those requiring GitHub repo cloning) can be slow. Monitor which metrics are hanging:

LOG_LEVEL=1 ./run.py urls.txt 2>&1 | grep -E "Running metric|Finished metric|timed out"

Integration Examples

CI/CD Pipeline

Use the CLI in a GitHub Actions workflow:

name: Evaluate Models

on:
  push:
    paths:
      - 'models.txt'

jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      
      - name: Install dependencies
        run: ./run.py install
      
      - name: Evaluate models
        run: ./run.py models.txt > results.ndjson
      
      - name: Upload results
        uses: actions/upload-artifact@v3
        with:
          name: evaluation-results
          path: results.ndjson

Automated Reporting

Generate a daily report of model quality:

#!/bin/bash

DATE=$(date +%Y-%m-%d)
./run.py production_models.txt > "results_${DATE}.ndjson"

# Generate summary statistics
jq -s '{
  total: length,
  avg_score: (map(.net_score) | add / length),
  high_quality: map(select(.net_score > 0.8)) | length
}' "results_${DATE}.ndjson" > "summary_${DATE}.json"

# Send to monitoring system
curl -X POST https://monitoring.example.com/metrics \
  -H "Content-Type: application/json" \
  -d @"summary_${DATE}.json"

Next Steps

API Reference

Use the REST API for programmatic access

Trust Metrics

Learn how each metric is calculated

Development Guide

Contribute new metrics or features

Deployment

Deploy to AWS for production use

Get Started

Core Concepts

Deployment

CLI Tool

Development

Documentation Index

​Quick Start Example

​Evaluating a Single Model

​Create URL File

​Run Evaluation

​Example Output

​Batch Evaluation of Multiple Models

​Create Multi-Model URL File

​Run Batch Evaluation

​Process Results

​Comma-Separated URLs

​Working with Different Model Sources

​Hugging Face Models

​GitHub Repositories

​Hugging Face Datasets

​Using Environment Variables

​Enable Debug Logging

​Log to File

​Disable Progress Bars

​Advanced Workflows

​Filter High-Quality Models

​Generate CSV Report

​Sort by Net Score

​Extract Specific Metrics

​Understanding Output Fields

​Core Fields

​Metric Scores

​Available Metrics

​Performance Tuning

​Parallel Processing

​Timeouts

​Memory Usage

​Troubleshooting Examples

​No Output Produced

​Metrics Return All Zeros

​Slow Evaluation

​Integration Examples

​CI/CD Pipeline

​Automated Reporting

​Next Steps

API Reference

Trust Metrics

Development Guide

Deployment

Build docs developers (and LLMs) love

Quick Start Example

Evaluating a Single Model

Create URL File

Run Evaluation

Example Output

Batch Evaluation of Multiple Models

Create Multi-Model URL File

Run Batch Evaluation

Process Results

Comma-Separated URLs

Working with Different Model Sources

Hugging Face Models

GitHub Repositories

Hugging Face Datasets

Using Environment Variables

Enable Debug Logging

Log to File

Disable Progress Bars

Advanced Workflows

Filter High-Quality Models

Generate CSV Report

Sort by Net Score

Extract Specific Metrics

Understanding Output Fields

Core Fields

Metric Scores

Available Metrics

Performance Tuning

Parallel Processing

Timeouts

Memory Usage

Troubleshooting Examples

No Output Produced

Metrics Return All Zeros

Slow Evaluation

Integration Examples

CI/CD Pipeline

Automated Reporting

Next Steps