Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/characat0/mlops-fundamentals-homework/llms.txt

Use this file to discover all available pages before exploring further.

The homework is graded out of 20 points across 5 sections. This page documents exactly what graders check for each point — read it before you start implementing so you know precisely what is expected at every stage.

Total Points Summary

ComponentPoints
Data Pipeline6
Model Serving5
Drift Monitoring3
Testing & CI/CD4
Documentation2
TOTAL20

Rubric by Section

1.1 Dataset Integrity — 1 pt

CriterionPoints
songs.csv MD5 matches the expected hash (dvc status songs.csv.dvc is clean)0.5
dvc repro produces data/raw.csv with the correct column count0.5

1.2 Process Script — 1.5 pts

CriterionPoints
Temporal split is correct — year ≤ 2010 → train, year > 2010 → prod_sim (exact boundary matters)0.5
Both data/train.csv and data/prod_sim.csv are produced0.5
Audio features and the genre column are present in both outputs0.5

1.3 Train Script — 2 pts

CriterionPoints
Loads training data and target (genre) correctly0.5
Trains 2+ different models (Logistic Regression + at least one other)0.5
Logs parameters and metrics (accuracy) to MLflow for each model0.5
All runs appear in MLflow UI with proper naming and artifacts0.5

1.4 Evaluate Script — 1 pt

CriterionPoints
Finds best model by accuracy metric0.5
Registers best model in MLflow Model Registry with champion alias0.5

1.5 DVC Pipeline — 0.5 pts

CriterionPoints
dvc repro runs without errors and produces all expected outputs0.5

2.1 API Implementation — 3 pts

CriterionPoints
GET /health endpoint returns the correct response1.0
POST /predict accepts a valid SpotifyFeatures payload and returns a prediction1.0
Request logging is implemented and writes to logs/api_requests.jsonl1.0

2.2 Pydantic Models — 1 pt

CriterionPoints
SpotifyFeatures includes all audio feature fields with correct types1.0

2.3 Dockerfile — 1 pt

CriterionPoints
Dockerfile builds without errors0.5
Includes a step to download the @champion model from MLflow at build time0.5

3.1 Batch Mode — 1.5 pts

CriterionPoints
Loads data/train.csv and data/prod_sim.csv correctly in --mode batch0.5
Kolmogorov-Smirnov test runs for each audio feature (uses scipy.stats.ks_2samp)0.5
drift_report.json contains per-feature ks_statistic, p_value, drift_detected, and an overall status0.5

3.2 Online Mode — 1.5 pts

CriterionPoints
Loads data/train.csv and logs/api_requests.jsonl correctly in --mode online0.5
Parses JSONL line-by-line and builds a DataFrame of production features0.5
Reuses the same KS analysis logic as batch mode (run_ks_analysis)0.5

4.1 Unit Tests — 2 pts

CriterionPoints
pytest data_pipeline/tests passes (all assertions pass)1.0
pytest model_serving/tests passes (all assertions pass)1.0

4.2 Code Quality — 1 pt

CriterionPoints
flake8 . shows no major style violations (warnings OK, errors not OK)1.0

4.3 GitHub Actions — 1 pt

CriterionPoints
CI pipeline passes on PR (linter + all tests pass; green checkmark in Actions tab)1.0

5.1 Code Quality — 1 pt

CriterionPoints
All TODO comments are addressed; code follows Python style guidelines1.0

5.2 README & Setup — 1 pt

CriterionPoints
README is clear and instructions are followable0.5
Setup works end-to-end (download → process → train → evaluate)0.5

Build docs developers (and LLMs) love