Containerization & Infrastructure

Why Containerization Matters

Containerization has become the foundation of modern ML systems. By packaging your code, dependencies, and runtime environment into a single unit, containers solve the classic “works on my machine” problem and enable consistent deployments across development, staging, and production. For ML practitioners, containers provide:

Reproducibility: Freeze exact versions of Python, CUDA, system libraries, and your code
Portability: Run the same container locally, on Kubernetes, or serverless platforms
Isolation: Avoid dependency conflicts between different models or services
Scalability: Deploy multiple replicas and scale horizontally as needed

Core Concepts

Docker Basics

Docker is the most popular containerization platform. A Dockerfile defines your container image as layers:

FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "train.py"]

Each instruction creates a layer that’s cached, making rebuilds faster. For ML workloads, consider:

Using minimal base images to reduce size
Installing heavy dependencies (PyTorch, TensorFlow) in early layers
Copying code last so changes don’t invalidate cached layers

Multi-stage builds let you compile or download models in one stage and copy only the artifacts to a smaller runtime image.

Container Registries

After building images, push them to a registry:

GitHub Container Registry (ghcr.io): Free for public repos
Docker Hub: Popular but rate-limited for anonymous pulls
AWS ECR / GCP Artifact Registry: Integrated with cloud platforms

Versioning with tags (e.g., app:v1.2.3 or app:latest) helps track what’s running where.

Kubernetes Fundamentals

Kubernetes (K8s) orchestrates containers at scale. Key abstractions for ML:

Pods

The smallest deployable unit—usually one container, but can include sidecars for logging or proxies

Jobs

Run containers to completion (perfect for training runs)

Deployments

Maintain a desired number of replicas (ideal for serving APIs)

Services

Provide stable networking and load balancing across pods

Example: Running a Training Job

apiVersion: batch/v1
kind: Job
metadata:
  name: bert-training
spec:
  template:
    spec:
      containers:
      - name: trainer
        image: ghcr.io/myorg/bert-trainer:v1.0
        resources:
          limits:
            nvidia.com/gpu: 1
      restartPolicy: Never

This Job requests one GPU and runs your training container once. K8s handles scheduling, retries on failure, and cleanup.

Local Development

Use kind (Kubernetes in Docker) or minikube to run a full K8s cluster locally. This lets you test manifests before deploying to production.

Cloud Platforms

Production K8s is typically managed:

AWS EKS: Integrates with IAM, EBS, and other AWS services
GCP GKE: Autopilot mode handles node management
Azure AKS: Good GPU support for ML workloads

For simpler needs, consider:

Google Cloud Run: Serverless containers (auto-scaling from zero)
AWS Fargate: Serverless compute for ECS/EKS
Railway / Modal: Developer-friendly platforms for pet projects

Serverless options can be more cost-effective for intermittent workloads, while full K8s gives you more control for high-throughput serving.

CI/CD Integration

Automate building and pushing containers:

# .github/workflows/build.yaml
name: Build Docker Image
on: [push]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Build image
        run: docker build -t myapp:${{ github.sha }} .
      - name: Push to registry
        run: docker push myapp:${{ github.sha }}

Popular CI/CD platforms include GitHub Actions, CircleCI, Jenkins, and GitLab CI.

Hands-On Examples

Explore practical containerization in Module 1:

Build ML and web app containers
Deploy to local Kubernetes with kind
Use k9s to monitor resources
Push images to GitHub Container Registry

Next Steps

Data Management

Learn how to store and version datasets

Model Serving

Deploy containers as production APIs

Getting Started

Core Concepts

Why Containerization Matters

Core Concepts

Docker Basics

Container Registries

Kubernetes Fundamentals

Pods

Jobs

Deployments

Services

Example: Running a Training Job

Local Development

Cloud Platforms

CI/CD Integration

Hands-On Examples

Next Steps

Data Management

Model Serving

Further Reading

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Documentation Index

​Why Containerization Matters

​Core Concepts

​Docker Basics

​Container Registries

​Kubernetes Fundamentals

Pods

Jobs

Deployments

Services

​Example: Running a Training Job

​Local Development

​Cloud Platforms

​CI/CD Integration

​Hands-On Examples

​Next Steps

Data Management

Model Serving

​Further Reading

Build docs developers (and LLMs) love

Why Containerization Matters

Core Concepts

Docker Basics

Container Registries

Kubernetes Fundamentals

Example: Running a Training Job

Local Development

Cloud Platforms

CI/CD Integration

Hands-On Examples

Next Steps

Further Reading