Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/GingerlyData247/SOTeam4-P2/llms.txt

Use this file to discover all available pages before exploring further.

Artifacts in the Trustworthy Model Registry follow a well-defined lifecycle from ingestion to deletion. This page documents each stage and the transitions between them.

Lifecycle overview

Stage 1: Ingest

Entry point: POST /artifact/{artifact_type} Source: src/api/routers/models.py:931

Model ingestion

For model artifacts, the system performs comprehensive ingestion:
# From src/api/routers/models.py:322-396
def _ingest_hf_core(source_url: str) -> Dict[str, Any]:
    # 1. Normalize Hugging Face ID
    hf_id = normalize_hf_id(source_url)
    
    # 2. Fetch license metadata
    hf_license = _fetch_hf_license(hf_id)
    
    # 3. Compute all metrics
    metrics = compute_metrics_for_model(base_resource)
    
    # 4. Extract parent models (lineage)
    parents = extract_parents_from_resource(enriched)
    
    # 5. Create registry entry
    created = _registry.create(mc)
    
    # 6. Generate ZIP artifact
    # 7. Upload to S3
    # 8. Return created artifact
1

URL normalization

Convert the input URL to a canonical Hugging Face model ID:
# src/utils/hf_normalize.py
"https://huggingface.co/bert-base-uncased"
"bert-base-uncased"
2

License extraction

Fetch license information from Hugging Face API:
# src/api/routers/models.py:296-310
api_url = f"https://huggingface.co/api/models/{hf_id}"
resp = requests.get(api_url, timeout=10)
license = resp.json().get("license", "").lower()
3

Metric computation

Compute all 13+ metrics via the scoring service:
# src/run.py:284-325
metrics = load_metrics()  # Discover all metric modules

for name, func in metrics.items():
    score, latency = run_with_timeout(func, resource, timeout=90)
    results[name] = (score, latency)
4

Lineage extraction

Extract parent model references from config.json:
# src/metrics/treescore.py:190-256
cfg = resource.get("config") or {}
candidate_keys = (
    "base_model",
    "teacher_model",
    "parent_model",
    "source_model",
)
parents = [cfg[k] for k in candidate_keys if k in cfg]

Dataset and code ingestion

Dataset and code artifacts follow a simplified flow:
# From src/api/routers/models.py:1008-1043
if artifact_type in ("dataset", "code"):
    mc = ModelCreate(
        name=final_name,
        version="1.0.0",
        card="",
        tags=[],
        metadata={},
        source_uri=body.url,
    )
    created = _registry.create(mc)
    created["metadata"]["type"] = artifact_type
Dataset and code artifacts are not scored during ingestion. Only model artifacts undergo metric computation.

Stage 2: Reviewedness gate

Source: src/api/routers/models.py:356-363 Before ingestion completes, models must pass a quality threshold:
# From src/api/routers/models.py:351-363
reviewedness = float(metrics.get("reviewedness", 0.0) or 0.0)

if reviewedness < 0.5:
    raise HTTPException(
        status_code=424,
        detail=f"Ingest rejected: reviewedness={reviewedness:.2f} < 0.50",
    )

Gate behavior

Reviewedness scoreResultHTTP status
>= 0.5Ingestion proceeds201 Created
< 0.5Ingestion rejected424 Failed Dependency
The reviewedness gate prevents low-quality or untrusted models from entering the registry. Models below the threshold must improve their download count, likes, or documentation before ingestion succeeds.

Stage 3: Rating

Source: src/services/scoring.py All model metrics are computed by the ScoringService:
# From src/services/scoring.py:180-292
def rate(self, resource: Any) -> Dict[str, Any]:
    # 1. Normalize to HF id
    hf_id = normalize_hf_id(raw_name)
    
    # 2. Build resource dict
    base_resource = {
        "name": hf_id,
        "url": f"https://huggingface.co/{hf_id}",
        "github_url": None,
        "local_path": None,
    }
    
    # 3. Compute all metrics
    metrics = compute_metrics_for_model(base_resource)
    
    # 4. Normalize size_score to dict format
    # 5. Return ModelRating-compatible dict

Metric execution

Metrics run with timeouts to prevent hangs:
# From src/run.py:298-306
for name, func in metrics.items():
    try:
        score, latency = run_with_timeout(
            func, resource, timeout=90, label=f"metric:{name}"
        )
        score = float(max(0.0, min(1.0, score)))
    except Exception:
        score, latency = 0.0, 0
    
    results[name] = (score, latency)
Metric failures degrade gracefully to (0.0, 0) without crashing the entire rating process.

Stage 4: Storage

Registry persistence

Source: src/services/registry.py The registry stores artifacts in S3 as JSON:
# From src/services/registry.py:90-114
def create(self, m) -> Dict[str, Any]:
    self._load()  # Reload from S3
    self._id_counter += 1
    new_id = str(self._id_counter)
    
    entry = {
        "id": new_id,
        "name": m.name,
        "version": m.version,
        "metadata": dict(m.metadata),
    }
    
    self._models.append(entry)
    self._save()  # Write back to S3
    return entry

Storage location

Artifacts are stored in two S3 objects:
Bucket: {S3_BUCKET}
Key: registry/registry.json

Structure:
{
  "models": [
    {
      "id": "1",
      "name": "bert-base-uncased",
      "version": "1.0.0",
      "metadata": { ... all metrics ... }
    }
  ],
  "id_counter": 1
}

Artifact ZIP generation

# From src/api/routers/models.py:415-439
mem_zip = io.BytesIO()
with zipfile.ZipFile(mem_zip, "w", zipfile.ZIP_DEFLATED) as z:
    z.writestr("source_url.txt", hf_url)

key = f"artifacts/model/{model_id}.zip"
_storage.put_bytes(key, mem_zip.getvalue())
presigned = _storage.presign(key)
The ZIP artifact is minimal for model ingestion. It only contains the source URL, not the actual model weights.

Local storage fallback

Source: src/services/storage.py:56-66 For development without AWS:
# From src/services/storage.py:36-66
LOCAL_MODE = os.getenv("LOCAL_STORAGE", "0") == "1"

if LOCAL_MODE:
    path = os.path.join("/tmp/local-artifacts", key)
    os.makedirs(os.path.dirname(path), exist_ok=True)
    with open(path, "wb") as f:
        f.write(data)
else:
    s3.put_object(Bucket=BUCKET, Key=key, Body=data)
Set LOCAL_STORAGE=1 to store artifacts locally at /tmp/local-artifacts/.

Stage 5: Download

Entry point: GET /artifacts/{artifact_type}/{id} Source: src/api/routers/models.py:1057

Retrieval flow

# From src/api/routers/models.py:1105-1148
item = _registry.get(artifact_id)
if not item:
    raise HTTPException(status_code=404, detail="Artifact does not exist.")

meta = item.get("metadata") or {}
source_uri = item.get("source_uri") or meta.get("source_uri")
download_url = meta.get("download_url")

return Artifact(
    metadata=ArtifactMetadata(
        name=item["name"],
        id=item["id"],
        type=stored_type,
    ),
    data=ArtifactData(
        url=source_uri,
        download_url=download_url
    )
)

Presigned URLs

Source: src/services/storage.py:80-91 Download URLs are temporary S3 presigned URLs:
# From src/services/storage.py:80-91
def presign(self, key: str, expires: int = 3600) -> str:
    if LOCAL_MODE:
        return f"local://download/{key}"
    
    return s3.generate_presigned_url(
        "get_object",
        Params={"Bucket": BUCKET, "Key": key},
        ExpiresIn=expires,  # 1 hour default
    )
Presigned URLs expire after 3600 seconds (1 hour) by default.

Stage 6: Delete

Entry point: DELETE /artifacts/{artifact_type}/{id} Source: src/api/routers/models.py:1207

Deletion flow

# From src/api/routers/models.py:1208-1221
ok = _registry.delete(id)
if not ok:
    raise HTTPException(status_code=404, detail="Artifact does not exist.")
return {"status": "deleted", "id": id}

Registry deletion logic

Source: src/services/registry.py:129-136
# From src/services/registry.py:129-136
def delete(self, id_: str) -> bool:
    self._load()
    before = len(self._models)
    self._models = [m for m in self._models if str(m.get("id", "")) != id_]
    if len(self._models) < before:
        self._save()
        return True
    return False
Deletion removes the artifact from registry.json but does not delete the ZIP file from S3. The artifact ZIP remains in storage.

State transitions

Valid transitions

Error states

ErrorStageHTTP statusRecovery
Invalid URLIngest400Fix URL format
Low reviewednessRating424Improve model quality
S3 write failureStoring500Retry ingestion
Artifact not foundDownload/Delete404Verify artifact ID

Registry format

The registry.json file structure:
{
  "models": [
    {
      "id": "1",
      "name": "bert-base-uncased",
      "version": "1.0.0",
      "metadata": {
        "type": "model",
        "net_score": 0.7234,
        "reproducibility": 0.8,
        "reviewedness": 0.89,
        "license": 1.0,
        "code_quality": 0.75,
        "bus_factor": 0.6,
        "ramp_up_time": 0.65,
        "performance_claims": 1.0,
        "dataset_quality": 0.7,
        "dataset_and_code_score": 1.0,
        "treescore": 0.8125,
        "size": {
          "raspberry_pi": 0.5,
          "jetson_nano": 0.75,
          "desktop_pc": 0.92,
          "aws_server": 0.95
        },
        "parents": ["google-bert/bert-base-uncased"],
        "download_url": "https://s3.../artifacts/model/1.zip"
      }
    }
  ],
  "id_counter": 1
}

Best practices

Always validate inputs

The ingestion service validates URLs, normalizes IDs, and checks score thresholds before persisting.

Handle failures gracefully

Metric computation failures default to 0.0 without crashing the entire rating process.

Use presigned URLs

Download URLs expire after 1 hour. Generate fresh URLs on each download request.

Monitor S3 writes

Registry writes to S3 can fail. The system logs errors but doesn’t retry automatically.

Build docs developers (and LLMs) love