Artifact lifecycle

Artifacts in the Trustworthy Model Registry follow a well-defined lifecycle from ingestion to deletion. This page documents each stage and the transitions between them.

Lifecycle overview

Stage 1: Ingest

Entry point: POST /artifact/{artifact_type} Source: src/api/routers/models.py:931

Model ingestion

For model artifacts, the system performs comprehensive ingestion:

# From src/api/routers/models.py:322-396
def _ingest_hf_core(source_url: str) -> Dict[str, Any]:
    # 1. Normalize Hugging Face ID
    hf_id = normalize_hf_id(source_url)
    
    # 2. Fetch license metadata
    hf_license = _fetch_hf_license(hf_id)
    
    # 3. Compute all metrics
    metrics = compute_metrics_for_model(base_resource)
    
    # 4. Extract parent models (lineage)
    parents = extract_parents_from_resource(enriched)
    
    # 5. Create registry entry
    created = _registry.create(mc)
    
    # 6. Generate ZIP artifact
    # 7. Upload to S3
    # 8. Return created artifact

URL normalization

Convert the input URL to a canonical Hugging Face model ID:

# src/utils/hf_normalize.py
"https://huggingface.co/bert-base-uncased"
→ "bert-base-uncased"

License extraction

Fetch license information from Hugging Face API:

# src/api/routers/models.py:296-310
api_url = f"https://huggingface.co/api/models/{hf_id}"
resp = requests.get(api_url, timeout=10)
license = resp.json().get("license", "").lower()

Metric computation

Compute all 13+ metrics via the scoring service:

# src/run.py:284-325
metrics = load_metrics()  # Discover all metric modules

for name, func in metrics.items():
    score, latency = run_with_timeout(func, resource, timeout=90)
    results[name] = (score, latency)

Lineage extraction

Extract parent model references from config.json:

# src/metrics/treescore.py:190-256
cfg = resource.get("config") or {}
candidate_keys = (
    "base_model",
    "teacher_model",
    "parent_model",
    "source_model",
)
parents = [cfg[k] for k in candidate_keys if k in cfg]

Dataset and code ingestion

Dataset and code artifacts follow a simplified flow:

# From src/api/routers/models.py:1008-1043
if artifact_type in ("dataset", "code"):
    mc = ModelCreate(
        name=final_name,
        version="1.0.0",
        card="",
        tags=[],
        metadata={},
        source_uri=body.url,
    )
    created = _registry.create(mc)
    created["metadata"]["type"] = artifact_type

Dataset and code artifacts are not scored during ingestion. Only model artifacts undergo metric computation.

Stage 2: Reviewedness gate

Source: src/api/routers/models.py:356-363 Before ingestion completes, models must pass a quality threshold:

# From src/api/routers/models.py:351-363
reviewedness = float(metrics.get("reviewedness", 0.0) or 0.0)

if reviewedness < 0.5:
    raise HTTPException(
        status_code=424,
        detail=f"Ingest rejected: reviewedness={reviewedness:.2f} < 0.50",
    )

Gate behavior

Reviewedness score	Result	HTTP status
>= 0.5	Ingestion proceeds	201 Created
< 0.5	Ingestion rejected	424 Failed Dependency

The reviewedness gate prevents low-quality or untrusted models from entering the registry. Models below the threshold must improve their download count, likes, or documentation before ingestion succeeds.

Stage 3: Rating

Source: src/services/scoring.py All model metrics are computed by the ScoringService:

# From src/services/scoring.py:180-292
def rate(self, resource: Any) -> Dict[str, Any]:
    # 1. Normalize to HF id
    hf_id = normalize_hf_id(raw_name)
    
    # 2. Build resource dict
    base_resource = {
        "name": hf_id,
        "url": f"https://huggingface.co/{hf_id}",
        "github_url": None,
        "local_path": None,
    }
    
    # 3. Compute all metrics
    metrics = compute_metrics_for_model(base_resource)
    
    # 4. Normalize size_score to dict format
    # 5. Return ModelRating-compatible dict

Metric execution

Metrics run with timeouts to prevent hangs:

# From src/run.py:298-306
for name, func in metrics.items():
    try:
        score, latency = run_with_timeout(
            func, resource, timeout=90, label=f"metric:{name}"
        )
        score = float(max(0.0, min(1.0, score)))
    except Exception:
        score, latency = 0.0, 0
    
    results[name] = (score, latency)

Metric failures degrade gracefully to (0.0, 0) without crashing the entire rating process.

Stage 4: Storage

Registry persistence

Source: src/services/registry.py The registry stores artifacts in S3 as JSON:

# From src/services/registry.py:90-114
def create(self, m) -> Dict[str, Any]:
    self._load()  # Reload from S3
    self._id_counter += 1
    new_id = str(self._id_counter)
    
    entry = {
        "id": new_id,
        "name": m.name,
        "version": m.version,
        "metadata": dict(m.metadata),
    }
    
    self._models.append(entry)
    self._save()  # Write back to S3
    return entry

Storage location

Artifacts are stored in two S3 objects:

Bucket: {S3_BUCKET}
Key: registry/registry.json

Structure:
{
  "models": [
    {
      "id": "1",
      "name": "bert-base-uncased",
      "version": "1.0.0",
      "metadata": { ... all metrics ... }
    }
  ],
  "id_counter": 1
}

Artifact ZIP generation

# From src/api/routers/models.py:415-439
mem_zip = io.BytesIO()
with zipfile.ZipFile(mem_zip, "w", zipfile.ZIP_DEFLATED) as z:
    z.writestr("source_url.txt", hf_url)

key = f"artifacts/model/{model_id}.zip"
_storage.put_bytes(key, mem_zip.getvalue())
presigned = _storage.presign(key)

The ZIP artifact is minimal for model ingestion. It only contains the source URL, not the actual model weights.

Local storage fallback

Source: src/services/storage.py:56-66 For development without AWS:

# From src/services/storage.py:36-66
LOCAL_MODE = os.getenv("LOCAL_STORAGE", "0") == "1"

if LOCAL_MODE:
    path = os.path.join("/tmp/local-artifacts", key)
    os.makedirs(os.path.dirname(path), exist_ok=True)
    with open(path, "wb") as f:
        f.write(data)
else:
    s3.put_object(Bucket=BUCKET, Key=key, Body=data)

Set LOCAL_STORAGE=1 to store artifacts locally at /tmp/local-artifacts/.

Stage 5: Download

Entry point: GET /artifacts/{artifact_type}/{id} Source: src/api/routers/models.py:1057

Retrieval flow

# From src/api/routers/models.py:1105-1148
item = _registry.get(artifact_id)
if not item:
    raise HTTPException(status_code=404, detail="Artifact does not exist.")

meta = item.get("metadata") or {}
source_uri = item.get("source_uri") or meta.get("source_uri")
download_url = meta.get("download_url")

return Artifact(
    metadata=ArtifactMetadata(
        name=item["name"],
        id=item["id"],
        type=stored_type,
    ),
    data=ArtifactData(
        url=source_uri,
        download_url=download_url
    )
)

Presigned URLs

Source: src/services/storage.py:80-91 Download URLs are temporary S3 presigned URLs:

# From src/services/storage.py:80-91
def presign(self, key: str, expires: int = 3600) -> str:
    if LOCAL_MODE:
        return f"local://download/{key}"
    
    return s3.generate_presigned_url(
        "get_object",
        Params={"Bucket": BUCKET, "Key": key},
        ExpiresIn=expires,  # 1 hour default
    )

Presigned URLs expire after 3600 seconds (1 hour) by default.

Stage 6: Delete

Entry point: DELETE /artifacts/{artifact_type}/{id} Source: src/api/routers/models.py:1207

Deletion flow

# From src/api/routers/models.py:1208-1221
ok = _registry.delete(id)
if not ok:
    raise HTTPException(status_code=404, detail="Artifact does not exist.")
return {"status": "deleted", "id": id}

Registry deletion logic

Source: src/services/registry.py:129-136

# From src/services/registry.py:129-136
def delete(self, id_: str) -> bool:
    self._load()
    before = len(self._models)
    self._models = [m for m in self._models if str(m.get("id", "")) != id_]
    if len(self._models) < before:
        self._save()
        return True
    return False

Deletion removes the artifact from registry.json but does not delete the ZIP file from S3. The artifact ZIP remains in storage.

State transitions

Valid transitions

Error states

Error	Stage	HTTP status	Recovery
Invalid URL	Ingest	400	Fix URL format
Low reviewedness	Rating	424	Improve model quality
S3 write failure	Storing	500	Retry ingestion
Artifact not found	Download/Delete	404	Verify artifact ID

Registry format

The registry.json file structure:

{
  "models": [
    {
      "id": "1",
      "name": "bert-base-uncased",
      "version": "1.0.0",
      "metadata": {
        "type": "model",
        "net_score": 0.7234,
        "reproducibility": 0.8,
        "reviewedness": 0.89,
        "license": 1.0,
        "code_quality": 0.75,
        "bus_factor": 0.6,
        "ramp_up_time": 0.65,
        "performance_claims": 1.0,
        "dataset_quality": 0.7,
        "dataset_and_code_score": 1.0,
        "treescore": 0.8125,
        "size": {
          "raspberry_pi": 0.5,
          "jetson_nano": 0.75,
          "desktop_pc": 0.92,
          "aws_server": 0.95
        },
        "parents": ["google-bert/bert-base-uncased"],
        "download_url": "https://s3.../artifacts/model/1.zip"
      }
    }
  ],
  "id_counter": 1
}

Best practices

Always validate inputs

The ingestion service validates URLs, normalizes IDs, and checks score thresholds before persisting.

Handle failures gracefully

Metric computation failures default to 0.0 without crashing the entire rating process.

Use presigned URLs

Download URLs expire after 1 hour. Generate fresh URLs on each download request.

Monitor S3 writes

Registry writes to S3 can fail. The system logs errors but doesn’t retry automatically.

Get Started

Core Concepts

Deployment

CLI Tool

Development

Lifecycle overview

Stage 1: Ingest

Model ingestion

Dataset and code ingestion

Stage 2: Reviewedness gate

Gate behavior

Stage 3: Rating

Metric execution

Stage 4: Storage

Registry persistence

Storage location

Artifact ZIP generation

Local storage fallback

Stage 5: Download

Retrieval flow

Presigned URLs

Stage 6: Delete

Deletion flow

Registry deletion logic

State transitions

Valid transitions

Error states

Registry format

Best practices

Always validate inputs

Handle failures gracefully

Use presigned URLs

Monitor S3 writes

Build docs developers (and LLMs) love

Get Started

Core Concepts

Deployment

CLI Tool

Development

Documentation Index

​Lifecycle overview

​Stage 1: Ingest

​Model ingestion

​Dataset and code ingestion

​Stage 2: Reviewedness gate

​Gate behavior

​Stage 3: Rating

​Metric execution

​Stage 4: Storage

​Registry persistence

​Storage location

​Artifact ZIP generation

​Local storage fallback

​Stage 5: Download

​Retrieval flow

​Presigned URLs

​Stage 6: Delete

​Deletion flow

​Registry deletion logic

​State transitions

​Valid transitions

​Error states

​Registry format

​Best practices

Always validate inputs

Handle failures gracefully

Use presigned URLs

Monitor S3 writes

Build docs developers (and LLMs) love

Lifecycle overview

Stage 1: Ingest

Model ingestion

Dataset and code ingestion

Stage 2: Reviewedness gate

Gate behavior

Stage 3: Rating

Metric execution

Stage 4: Storage

Registry persistence

Storage location

Artifact ZIP generation

Local storage fallback

Stage 5: Download

Retrieval flow

Presigned URLs

Stage 6: Delete

Deletion flow

Registry deletion logic

State transitions

Valid transitions

Error states

Registry format

Best practices