Submission is via GitHub pull request. Your grade depends on a passing CI run — implement all TODOs across the three homework stages, get a green checkmark in the Actions tab, then submit the PR URL to your instructor. The CI pipeline runs automatically on every push and checks both linting and tests.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/characat0/mlops-fundamentals-homework/llms.txt
Use this file to discover all available pages before exploring further.
Fork the Repository
Go to the course repository on GitHub. Click Fork (top-right corner) → Create fork. This creates your own copy at
your-username/mlops-fundamentals-homework.Create a Working Branch
All your work must live on a dedicated solution branch. Use your real name so graders can identify your submission at a glance.
Set Up the Environment
Install all dependencies and download the Spotify dataset before touching any source files. See the Setup guide for the full walkthrough — it covers creating the virtual environment, Kaggle authentication, and starting the MLflow server.
Implement the Tasks
Work through each stage in order following the Implementation Guide below. Each stage feeds the next, so complete them sequentially. Commit your progress regularly so CI runs reflect incremental work:
Open a Pull Request
- Go to your fork on GitHub.
- Click Compare & pull request (GitHub shows this banner automatically after a push).
- Set the base repository to the course repo and base branch to
main. - Title your PR:
[Homework] <Your Full Name>(e.g.,[Homework] Maria Garcia). - Paste your completed Submission Checklist in the PR description.
- Click Create pull request.
PR Requirements
Your pull request must meet all three criteria before graders will review it:
- Title format:
[Homework] <Your Full Name> - Base branch:
mainon the course repo (not your fork) - CI must pass: a green checkmark in GitHub Actions is required
Implementation Checklist
Complete every item below before opening your PR. Each maps directly to a TODO comment in the source code and carries points in the grading rubric.Stage 1 — Data Pipeline
-
process.py— implement the temporal split: year ≤ 2010 →train.csv, year > 2010 →prod_sim.csv; save both files withto_csv(..., index=False) -
train.py— encodegenrelabels withLabelEncoder; scale features withStandardScalerfor Logistic Regression (XGBoost skips scaling); loop through models inparams.yaml, starting anmlflow.start_run()for each; log hyperparameters withmlflow.log_params(), accuracy withmlflow.log_metric(), and save model artifacts withmlflow.sklearn.log_model()/mlflow.xgboost.log_model() -
evaluate.py— callclient.create_model_version()to register the best run, thenclient.set_registered_model_alias()to assign the@championalias
Stage 2 — Model Serving
-
app/main.py—SpotifyFeatures— add all 12 audio feature fields with correct types to the Pydantic model -
app/main.py—GET /health— implement the endpoint returning{"status": "healthy"} -
app/main.py—log_requestsmiddleware — read the request body, parse as JSON, append a timestamped JSON line tologs/api_requests.jsonl, and reconstruct the request before callingcall_next -
app/main.py—predict_genre()— load the MLflow model from./models/, extract feature values from theSpotifyFeaturesobject, run inference, and return aPredictionResponsewith genre and confidence -
Dockerfile— addARG MLFLOW_TRACKING_URIand theRUN mlflow models downloadstep to pull the@championmodel into./models/at build time
Stage 3 — Drift Monitoring
-
analyze_drift.py—run_ks_analysis()— loop over each feature infeatures_to_test, runscipy.stats.ks_2samp(train_values, prod_values), flag drift whenp_value < 0.05, and populatedrift_results["details"][feature]withks_statistic,p_value,drift_detected,train_mean, andprod_mean; this single function is reused by both batch and online modes