Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/characat0/mlops-fundamentals-homework/llms.txt

Use this file to discover all available pages before exploring further.

The homework includes two pytest suites — one for data_pipeline and one for model_serving. Both must pass in their entirety for a green CI checkmark on your pull request. Each suite is designed to verify a specific slice of the pipeline: data ingestion and temporal splitting in the data pipeline, and HTTP contract correctness in the model serving API.

Data Pipeline Tests

The data_pipeline/tests/ directory contains two test files covering the load_data and process_data functions.

test_load.py — 3 tests

All three tests create a temporary directory, write a sample CSV, call load_data, and inspect the output.
Verifies that load_data creates the output CSV file at the specified path (including any intermediate directories). The test passes a two-row sample DataFrame and asserts os.path.exists(output_path) after the call.
Verifies that all source columns pass through the load step unchanged. The sample input has five columns (year, genre, danceability, energy, extra_column) and the test asserts list(result.columns) == list(sample.columns) — order matters.
Verifies that the row count is identical between the source and the output. A three-row sample is written and the test asserts len(result) == 3 after reading the output CSV back.
pytest data_pipeline/tests/
pytest data_pipeline/tests/ -v  # verbose output

test_process.py — 3 tests

All three tests create a temporary directory, write a 14-column sample CSV with all audio features, call process_data, and inspect the resulting train/prod splits.
Verifies rows are split correctly at year_threshold=2010. The sample contains years [2005, 2008, 2010, 2012, 2015]. After calling process_data:
  • train.csv must have 3 rows (years ≤ 2010: 2005, 2008, 2010)
  • prod.csv must have 2 rows (years > 2010: 2012, 2015)
  • All values in train_df["year"] must satisfy <= 2010
  • All values in prod_df["year"] must satisfy > 2010
Verifies all 12 audio feature columns are present in both output splits. The test iterates over ["danceability", "energy", "key", "loudness", "mode", "speechiness", "acousticness", "instrumentalness", "liveness", "valence", "tempo", "duration_ms"] and asserts each column appears in both train_df.columns and prod_df.columns.
Verifies the exact boundary behaviour — year=2010 goes to train, year=2011 goes to prod. The sample contains years [2009, 2010, 2011, 2012]. After splitting:
  • train.csv must have exactly {2009, 2010} in its year column
  • prod.csv must have exactly {2011, 2012} in its year column
pytest data_pipeline/tests/
pytest data_pipeline/tests/ -v

Model Serving Tests

The model_serving/tests/ directory contains test_api.py, which uses FastAPI’s TestClient to send HTTP requests directly to the application without starting a live server.

test_api.py — 3 tests

Sends GET /health and asserts:
  • HTTP status code 200
  • Response body exactly {"status": "healthy"}
Sends POST /predict with a complete 12-field payload:
{
  "danceability": 0.7,
  "energy": 0.8,
  "key": 5,
  "loudness": -5.0,
  "mode": 1,
  "speechiness": 0.05,
  "acousticness": 0.1,
  "instrumentalness": 0.0,
  "liveness": 0.2,
  "valence": 0.6,
  "tempo": 120.0,
  "duration_ms": 240000
}
Asserts HTTP status code 200 and that the response JSON contains both "genre" and "confidence" keys.
Sends POST /predict with only {"danceability": 0.7} — all 11 other required fields are absent. Asserts HTTP status code 422 (Unprocessable Entity), which FastAPI returns automatically when Pydantic validation fails.
Tests use FastAPI’s TestClient, which wraps the ASGI app directly. No server needs to be running — pytest model_serving/tests/ is sufficient.
pytest model_serving/tests/
pytest model_serving/tests/ -v

Running All Tests

To run both suites in a single command — exactly as CI does:
pytest data_pipeline/tests model_serving/tests

Code Style

The project uses flake8 for linting. The .flake8 configuration at the repository root sets:
  • max-line-length = 100
  • extend-ignore = E203 (whitespace before : — suppressed for numpy/pandas slice syntax)
  • per-file-ignores: model_serving/app/main.py: F401 — scaffold imports in main.py are provided for students to use in their TODO implementations and are therefore exempt from the unused-import rule.
Run the linter with:
flake8 .
A clean flake8 run with no errors is worth 1 point in the grading rubric. Fix all reported issues before opening your pull request.

Build docs developers (and LLMs) love