Architecture

Overview

The Trustworthy Model Registry is built as a cloud-native, serverless application designed to run on AWS while maintaining compatibility with local development environments. The system follows a modular architecture with clear separation of concerns across API routing, business logic, data persistence, and metric computation.

High-level architecture

AWS components

The system is deployed entirely on AWS using free-tier-compatible services:

AWS Lambda

Stateless execution of the FastAPI backend via the Mangum adapter. Handles all API requests without maintaining server state.

API Gateway

Public REST interface that routes HTTP requests to Lambda. Provides CORS handling and request/response transformation.

Amazon S3

Persistent storage for both the registry metadata (registry.json) and artifact binaries (models, datasets, code).

CloudWatch

Centralized logging and monitoring. All requests, errors, and metrics are captured for observability.

All AWS services are configured to stay within free-tier limits, making the system cost-effective for development and demonstration purposes.

Application structure

The codebase is organized into several key modules:

API layer (`src/api/`)

Handles HTTP routing, request validation, and response formatting.

src/main.py - Application entry point

Initializes the FastAPI application and configures middleware:

src/main.py

from fastapi import FastAPI
from starlette.middleware.cors import CORSMiddleware
from mangum import Mangum

from src.api.routers.models import router as models_router
from src.api.middleware.log_requests import DeepASGILogger

# Create FastAPI app
app = FastAPI(title="SOTeam4P2 API")

# Add logging middleware
app.add_middleware(DeepASGILogger)

# Configure CORS
app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://sot4-model-registry-dev.s3-website.us-east-2.amazonaws.com"],
    allow_credentials=False,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Mount API routers
app.include_router(models_router, prefix="/api")

# AWS Lambda handler
handler = Mangum(app)

Key decisions:

Middleware is added before routers to guarantee full request coverage
CORS is handled at the application level for consistency
Mangum adapter enables seamless Lambda deployment without code changes

src/api/routers/models.py - API endpoints

Implements all registry endpoints following the OpenAPI specification:Core endpoints:

POST /artifact/{type} - Create new artifacts
GET /artifacts/{type}/{id} - Retrieve artifact by ID
PUT /artifacts/{type}/{id} - Update artifact metadata
DELETE /artifacts/{type}/{id} - Delete an artifact
POST /artifacts - List/enumerate artifacts with pagination

Model-specific endpoints:

GET /artifact/model/{id}/rate - Compute trust metrics
GET /artifact/model/{id}/lineage - Get dependency graph
POST /artifact/model/{id}/license-check - Validate license compatibility
GET /artifact/{type}/{id}/cost - Estimate operational costs

Search endpoints:

GET /artifact/byName/{name} - Exact name match
POST /artifact/byRegEx - Regex-based search

System endpoints:

GET /health - System health check
DELETE /reset - Reset registry to default state
GET /tracks - List planned feature tracks

src/api/middleware/log_requests.py - Request logging

Custom ASGI middleware for comprehensive request/response logging:

src/api/middleware/log_requests.py

class DeepASGILogger:
    def __init__(self, app: ASGIApp):
        self.app = app

    async def __call__(self, scope: Scope, receive: Receive, send: Send):
        if scope["type"] != "http":
            await self.app(scope, receive, send)
            return

        rid = str(uuid.uuid4())[:8]
        method = scope.get("method")
        path = scope.get("path")

        # Capture request body
        body_bytes = b""
        async def recv_wrapper():
            nonlocal body_bytes
            msg = await receive()
            if msg["type"] == "http.request":
                body_bytes += msg.get("body", b"")
            return msg

        # Capture response
        resp_body = b""
        status_code = None
        async def send_wrapper(message):
            nonlocal resp_body, status_code
            if message["type"] == "http.response.start":
                status_code = message["status"]
            if message["type"] == "http.response.body":
                resp_body += message.get("body", b"")
            await send(message)

        start = time.time()
        await self.app(scope, recv_wrapper, send_wrapper)
        duration_ms = round((time.time() - start) * 1000, 2)

        # Log complete request/response with timing
        print(f"[{rid}] {method} {path} -> {status_code} ({duration_ms}ms)")

Features:

Unique request IDs for tracing
Full request/response body capture
Latency measurement
CloudWatch-compatible output

Service layer (`src/services/`)

Contains core business logic isolated from HTTP concerns.

src/services/registry.py - Registry management

Manages artifact persistence and metadata in S3:

src/services/registry.py

class RegistryService:
    def __init__(self, bucket_name: str, key: str = "registry/registry.json"):
        self.s3 = boto3.client("s3")
        self.bucket = bucket_name
        self.key = key
        self._models: List[Dict[str, Any]] = []
        self._id_counter: int = 0
        self._load()

    def _load(self):
        """Load registry.json from S3 safely."""
        try:
            obj = self.s3.get_object(Bucket=self.bucket, Key=self.key)
            content = obj["Body"].read()
            data = json.loads(content)
            self._models = data.get("models", [])
            self._id_counter = data.get("id_counter", 0)
        except Exception as e:
            logger.error(f"Failed to load registry: {e}")
            self._models = []
            self._id_counter = 0

    def create(self, m) -> Dict[str, Any]:
        """Create new artifact entry."""
        self._load()
        self._id_counter += 1
        new_id = str(self._id_counter)
        entry = {
            "id": new_id,
            "name": getattr(m, "name", "Unnamed Model"),
            "version": getattr(m, "version", "1.0.0"),
            "metadata": dict(m.metadata) if hasattr(m, "metadata") else {},
        }
        self._models.append(entry)
        self._save()
        return entry

Key features:

Atomic S3 read/write operations
Auto-incrementing artifact IDs
Graceful failure handling
Metadata preservation

src/services/scoring.py - Metric computation

Orchestrates all trust metric calculations:

src/services/scoring.py

class ScoringService:
    def __init__(self):
        token = os.getenv("HUGGINGFACE_HUB_TOKEN")
        self.api = HfApi(token=token)

    def _build_resource(self, model_ref: str) -> Dict[str, Any]:
        """Fetch model metadata from HuggingFace."""
        resource = {
            "name": model_ref,
            "url": f"https://huggingface.co/{model_ref}",
        }
        info = self.api.model_info(model_ref)
        resource["license"] = getattr(info, "license", None)
        resource["tags"] = getattr(info, "tags", [])
        resource["downloads"] = getattr(info, "downloads", 0)
        
        # Read model card
        readme_path = self.api.hf_hub_download(model_ref, "README.md")
        with open(readme_path, "r") as f:
            resource["card_text"] = f.read()
        
        return resource

    def rate(self, resource: Any) -> Dict[str, Any]:
        """Compute all metrics for a model."""
        hf_id = normalize_hf_id(resource.get("name"))
        base_resource = {
            "name": hf_id,
            "url": f"https://huggingface.co/{hf_id}",
            "category": "MODEL",
        }
        # Compute metrics via run.py
        metrics = compute_metrics_for_model(base_resource)
        return metrics

Responsibilities:

Fetch HuggingFace model metadata
Extract GitHub links and dataset references
Invoke individual metric modules
Aggregate results into rating schema

src/services/storage.py - Artifact storage

Abstracts storage operations with S3/local fallback:

src/services/storage.py

class Storage:
    def put_bytes(self, key: str, data: bytes):
        """Store arbitrary bytes (ZIP, binary, text)."""
        if LOCAL_MODE:
            path = os.path.join(LOCAL_DIR, key)
            os.makedirs(os.path.dirname(path), exist_ok=True)
            with open(path, "wb") as f:
                f.write(data)
        else:
            s3_client.put_object(Bucket=BUCKET, Key=key, Body=data)

    def presign(self, key: str, expires: int = 3600) -> str:
        """Generate presigned download URL."""
        if LOCAL_MODE:
            return f"local://download/{key}"
        return s3_client.generate_presigned_url(
            "get_object",
            Params={"Bucket": BUCKET, "Key": key},
            ExpiresIn=expires,
        )

Features:

Transparent S3/local storage switching
Presigned URL generation for secure downloads
Binary-safe operations

Metrics layer (`src/metrics/`)

Individual metric implementations following a common interface.

Metric module pattern

All metrics follow the same interface:

def metric(resource: Dict[str, Any]) -> Tuple[float, int]:
    """Compute metric score and latency.
    
    Args:
        resource: Artifact metadata including name, url, local_path, etc.
    
    Returns:
        (score, latency_ms): Score in [0, 1] and computation time in milliseconds
    """
    start = time.perf_counter()
    
    # Metric computation logic
    score = 0.0
    
    latency = int((time.perf_counter() - start) * 1000)
    return round(score, 3), latency

Available metrics:

ramp_up_time - Documentation completeness and example quality
bus_factor - Team diversity and contributor distribution
performance_claims - Benchmark evidence and performance documentation
license - License compatibility and suitability
dataset_and_code_score - Dataset and code availability
dataset_quality - Dataset documentation and metadata quality
code_quality - GitHub repository code quality signals
reproducibility - Environment files, notebooks, reproduction instructions
reviewedness - PR review coverage from GitHub
treescore - Aggregate score of parent model lineage
size - Hardware compatibility (Raspberry Pi, Jetson, Desktop, Server)

Example: Reproducibility metric

src/metrics/reproducibility.py

def _score_local_reproducibility(local_dir: str) -> float:
    """Inspect local repository for reproducibility signals."""
    score = 0.0
    p = Path(local_dir)
    
    # requirements.txt → +0.4
    if any(f.name.lower().startswith("requirements") for f in p.iterdir()):
        score += 0.4
    
    # environment.yml → +0.2
    if any(f.name.lower().startswith("environment") for f in p.iterdir()):
        score += 0.2
    
    # Jupyter notebooks → +0.2
    if any(f.suffix.lower() == ".ipynb" for f in p.iterdir()):
        score += 0.2
    
    # README mentions "reproduce" → +0.2
    for readme in p.glob("README*"):
        text = readme.read_text(encoding="utf-8", errors="ignore").lower()
        if "reproduce" in text:
            score += 0.2
            break
    
    return min(score, 1.0)

def metric(resource: Dict[str, Any]) -> Tuple[float, int]:
    start = time.perf_counter()
    local_dir = resource.get("local_dir") or resource.get("local_path")
    
    if local_dir and os.path.isdir(local_dir):
        score = _score_local_reproducibility(local_dir)
    else:
        score = _score_remote_reproducibility(resource)
    
    latency = int((time.perf_counter() - start) * 1000)
    return round(score, 3), latency

Request flow

A typical model ingestion request follows this path:

Request reception

API Gateway receives the HTTP POST request and invokes the Lambda function with the Mangum-wrapped FastAPI app.

Middleware processing

The DeepASGILogger middleware captures request details, assigns a request ID, and starts timing.

Routing

FastAPI routes the request to the appropriate handler in src/api/routers/models.py based on path and method.

Metric computation

The ScoringService fetches model metadata from HuggingFace and GitHub, then invokes all metric modules to compute trust scores.

Registry persistence

The computed metrics and artifact metadata are persisted to S3 via the RegistryService.

Artifact storage

A minimal ZIP artifact is created and uploaded to S3, and a presigned download URL is generated.

Response

The complete artifact response is serialized and returned through the middleware stack, logging the final status and latency.

Data models

The system uses Pydantic models for type safety and validation:

class ModelCreate(BaseModel):
    name: str
    version: str
    card: str = ""
    tags: List[str] = Field(default_factory=list)
    metadata: Optional[Dict[str, Any]] = None
    source_uri: Optional[str] = None

class ModelRating(BaseModel):
    name: str
    category: str
    net_score: float
    net_score_latency: float
    ramp_up_time: float
    ramp_up_time_latency: float
    bus_factor: float
    bus_factor_latency: float
    performance_claims: float
    performance_claims_latency: float
    license: float
    license_latency: float
    dataset_and_code_score: float
    dataset_and_code_score_latency: float
    dataset_quality: float
    dataset_quality_latency: float
    code_quality: float
    code_quality_latency: float
    reproducibility: float
    reproducibility_latency: float
    reviewedness: float
    reviewedness_latency: float
    tree_score: float
    tree_score_latency: float
    size_score: SizeScore
    size_score_latency: float

class SizeScore(BaseModel):
    raspberry_pi: float
    jetson_nano: float
    desktop_pc: float
    aws_server: float

Deployment architecture

Local development

# Start with uvicorn for hot reload
uvicorn src.run:app --reload --host 0.0.0.0 --port 8000

Local mode features:

Filesystem-based storage instead of S3 (set LOCAL_STORAGE=1)
SQLite fallback for registry (optional)
Hot reload for rapid development
Full debugging capabilities

AWS Lambda deployment

The application uses Mangum to adapt FastAPI for Lambda:

src/main.py

from mangum import Mangum

app = FastAPI(title="SOTeam4P2 API")
# ... middleware and routes ...

# Lambda handler
handler = Mangum(app)

Deployment process:

Package application with dependencies
Upload to AWS Lambda
Configure API Gateway routes
Set environment variables (S3_BUCKET, AWS_REGION)
Verify /health endpoint

The Mangum adapter handles all ASGI-to-Lambda event translation transparently.

Docker containerization

The Dockerfile supports both development and production:

FROM node:18-bullseye
WORKDIR /app

# Install Python 3.11 via pyenv
RUN curl -fsSL https://pyenv.run | bash
RUN /opt/pyenv/bin/pyenv install 3.11.9
RUN /opt/pyenv/bin/pyenv global 3.11.9

# Install dependencies
COPY requirements.txt .
RUN /opt/pyenv/shims/python -m pip install -r requirements.txt

# Copy application
COPY . .
ENV PYTHONPATH=/app:/app/src

CMD ["/opt/pyenv/shims/python", "run.py"]

Build and run:

docker build -t tmr .
docker run -p 8000:8000 -e S3_BUCKET=my-bucket tmr

Security considerations

The system implements multiple security layers:

Input validation: All endpoints use Pydantic schema validation
URL allowlisting: External URLs are validated before fetching
Presigned URLs: S3 downloads use time-limited, signed URLs
CORS policies: Strict origin controls for browser clients
Error handling: No sensitive information leaked in error responses
Rate limiting: CloudWatch-based monitoring for abuse detection

Performance characteristics

Latency profile

Typical operation latencies:

Operation	Average Latency
Health check	50-100ms
Artifact retrieval	100-200ms
Model ingestion	30-60s
Metric computation	15-45s
Lineage graph	200-500ms
License check	500-1000ms

Scalability

Horizontal: Lambda auto-scales to handle concurrent requests
Storage: S3 provides unlimited artifact storage
Registry: In-memory caching with S3 backing
Metrics: Computed once at ingestion, cached in metadata

Error handling

The system follows a consistent error handling strategy:

# 400 Bad Request - Invalid input
raise HTTPException(status_code=400, detail="Invalid artifact_type")

# 404 Not Found - Resource doesn't exist
raise HTTPException(status_code=404, detail="Artifact does not exist.")

# 424 Failed Dependency - Ingestion gate rejection
raise HTTPException(
    status_code=424,
    detail=f"Ingest rejected: reviewedness={score:.2f} < 0.50"
)

# 500 Internal Server Error - Unexpected failures
raise HTTPException(status_code=500, detail="Internal rating error.")

All errors are logged to CloudWatch with full context for debugging.

Observability

Request logging

All requests logged with ID, method, path, status, latency, and full request/response bodies

Metric latency

Every metric computation includes precise latency measurement in milliseconds

Health endpoint

/health exposes uptime, artifact count, and system status

CloudWatch integration

All logs automatically captured in CloudWatch for analysis and alerting

Enable detailed logging by setting LOG_LEVEL=2 in your environment for DEBUG-level output.

Get Started

Core Concepts

Deployment

CLI Tool

Development

Overview

High-level architecture

AWS components

AWS Lambda

API Gateway

Amazon S3

CloudWatch

Application structure

API layer (`src/api/`)

Service layer (`src/services/`)

Metrics layer (`src/metrics/`)

Request flow

Data models

Deployment architecture

Security considerations

Performance characteristics

Error handling

Observability

Request logging

Metric latency

Health endpoint

CloudWatch integration

Build docs developers (and LLMs) love

Get Started

Core Concepts

Deployment

CLI Tool

Development

Documentation Index

​Overview

​High-level architecture

​AWS components

AWS Lambda

API Gateway

Amazon S3

CloudWatch

​Application structure

​API layer (src/api/)

​Service layer (src/services/)

​Metrics layer (src/metrics/)

​Request flow

​Data models

​Deployment architecture

​Security considerations

​Performance characteristics

​Error handling

​Observability

Request logging

Metric latency

Health endpoint

CloudWatch integration

Build docs developers (and LLMs) love

Overview

High-level architecture

AWS components

Application structure

API layer (`src/api/`)

Service layer (`src/services/`)

Metrics layer (`src/metrics/`)

Request flow

Data models

Deployment architecture

Security considerations

Performance characteristics

Error handling

Observability