TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/characat0/mlops-fundamentals-homework/llms.txt
Use this file to discover all available pages before exploring further.
drift_monitoring module detects whether the distribution of Spotify audio features has shifted between the training era (≤2010) and production (>2010 or live API traffic). By comparing the statistical properties of each audio feature between a known baseline and a production dataset, it surfaces when the model may be receiving inputs that look meaningfully different from what it was trained on — a leading signal that prediction quality has likely degraded.
Two Modes
- Batch Mode
- Online Mode
Batch mode compares
data/train.csv against data/prod_sim.csv — the two temporal CSV splits produced by process.py. This is a direct comparison of the pre-streaming era (≤2010) against the streaming era (>2010) and represents the expected historical distribution shift.Audio Features Tested
The KS test is applied independently to each of the 12 Spotify audio features defined inAUDIO_FEATURES:
danceabilityenergykeyloudnessmodespeechinessacousticnessinstrumentalnesslivenessvalencetempoduration_ms
run_ks_analysis.
Drift Report Output
Both modes write their results todrift_report.json. The values below are illustrative — your actual output will vary based on sample sizes and the features that drift in your dataset.
details object contains one entry per tested feature with the raw KS statistic, p-value, drift verdict, and the mean of each distribution — allowing you to see not just whether drift occurred but in which direction (e.g. mean danceability rising from 0.48 to 0.65).
Status Logic
The overallstatus field is determined by the proportion of features that showed statistically significant drift:
drift_percentage | status |
|---|---|
| > 20% | DRIFT_DETECTED |
| ≤ 20% | NORMAL |
The 20% threshold means at least 3 of the 12 features must exhibit significant drift before the pipeline raises an alert.
Dependencies
The drift monitoring module requires only two libraries, listed indrift_monitoring/requirements.txt:
| Package | Minimum Version | Purpose |
|---|---|---|
pandas | >=2.0.0 | Loading and manipulating CSV and JSONL data |
scipy | >=1.10.0 | scipy.stats.ks_2samp for the KS two-sample test |
CLI Usage
Both modes are invoked through the same entry point,src/analyze_drift.py, with the --mode flag selecting between them:
KS Analysis
Dive into the
run_ks_analysis implementation — how scipy.stats.ks_2samp is applied per feature, how the drift results dict is built, and the full CLI argument reference.