The MLOps Fundamentals Homework takes you from raw Kaggle data to a fully deployed, monitored machine learning system. Working through four stages, you will implement a DVC-orchestrated training pipeline, track experiments with MLflow, serve predictions through a FastAPI application packaged in Docker, and detect data drift using Kolmogorov-Smirnov tests — exactly as you would in a real production MLOps role.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/characat0/mlops-fundamentals-homework/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Understand the homework structure, learning objectives, and what you will build.
Setup
Install dependencies, configure environment variables, and download the dataset.
Project Structure
Explore the monorepo layout and understand how the three subsystems connect.
How to Submit
Fork the repo, open a PR, and get a green CI checkmark before the deadline.
The Four Stages
Data Pipeline (6 pts)
Use DVC to orchestrate a four-stage pipeline: load → process → train → evaluate. Split the 550k Spotify Songs dataset at the 2010 streaming era boundary, train Logistic Regression and XGBoost classifiers, and register the champion model in the MLflow Model Registry.
Model Serving (5 pts)
Implement a FastAPI application that exposes
GET /health and POST /predict endpoints. Add a logging middleware that writes every prediction request to a JSONL file, and containerize the service with Docker — baking the champion model into the image at build time.Drift Monitoring (3 pts)
Run Kolmogorov-Smirnov tests across all 12 audio features in two modes: batch (comparing train vs. production CSV splits) and online (comparing training data against live API request logs).
Key Technologies
DVC
Data version control and pipeline orchestration via
dvc.yaml and params.yaml.MLflow
Experiment tracking, model registry, and the
@champion alias for deployment.FastAPI
Async REST API with Pydantic validation and HTTP middleware for request logging.
Docker
Self-contained container image with the champion model baked in at build time.
scikit-learn & XGBoost
Logistic Regression and XGBoost classifiers with StandardScaler preprocessing.
SciPy KS Test
scipy.stats.ks_2samp to detect distribution shift across 12 audio features.Your grade depends on a passing CI run on your pull request. Implement all TODOs, push your changes, and verify the GitHub Actions checkmark before submitting.