Documentation Index
Fetch the complete documentation index at: https://mintlify.com/kyryl-opens-ml/ml-in-production-practice/llms.txt
Use this file to discover all available pages before exploring further.
Data Monitoring
While system monitoring tracks infrastructure health, ML-specific monitoring focuses on model behavior, data quality, and prediction reliability. This includes drift detection, outlier identification, and performance degradation tracking.Why ML Monitoring Matters
Machine learning models face unique challenges in production:Distribution Shift
Input data distributions change over time, causing models to perform poorly on new data
Concept Drift
The relationship between inputs and outputs changes, invalidating learned patterns
Data Quality
Missing values, outliers, or schema changes can cause silent failures
Model Degradation
Performance slowly declines as the world changes, often going unnoticed
Types of Drift
Covariate Shift (Data Drift)
The distribution of input features changes: P(X) changes, but P(Y|X) remains the same. Example: A credit scoring model trained on pre-pandemic data sees different income distributions post-pandemic.Prior Probability Shift (Label Drift)
The distribution of the target variable changes: P(Y) changes, but P(X|Y) remains the same. Example: Fraud detection during a holiday shopping season sees more fraud attempts.Concept Drift
The relationship between inputs and outputs changes: P(Y|X) changes. Example: User preferences change, making an old recommendation model obsolete.Monitoring Tools
Evidently
Evidently is an open-source library for ML monitoring:- Generate interactive HTML reports
- Calculate drift metrics
- Profile data quality
- Track model performance
- No infrastructure required (can run as a Python script)
Seldon Core
Seldon Core is a model serving platform with built-in analytics:- Outlier detection using Alibi Detect
- Drift detection in production
- Explainability with Alibi Explain
- Integration with Kubernetes and MLServer
Alibi Detect
Alibi Detect provides algorithms for:- Drift detection (KS test, MMD, Chi-squared)
- Outlier detection (Isolation Forest, Mahalanobis distance)
- Online and offline detection modes
WhyLogs
WhyLogs offers lightweight data logging:- Efficient statistical profiling
- Minimal storage overhead
- Streaming-friendly
Seldon Core Setup
Seldon Core v2 provides a complete platform for model serving with monitoring capabilities.Architecture
Prerequisites
Seldon Core v2 requires:- Ansible and Python packages
- Kubernetes cluster (kind recommended)
- CLI tools (
kubectl,seldon)
Installation
The Ansible playbooks handle all the complexity of setting up Seldon Core, including namespaces, RBAC, and dependencies.
Basic Example: Iris Classification
Test the installation with a simple model:Drift Detection Example
Seldon’s income classification example demonstrates drift and outlier detection.Load Models and Detectors
Drift Detector Configuration
- Uses Kolmogorov-Smirnov (KS) test for continuous features
- Chi-squared test for categorical features
- Compares production data to reference distribution
- Returns drift scores and p-values
Outlier Detector Configuration
Create Pipeline
Combine models into a pipeline:Send Test Data
Use the test client to send normal and anomalous data:- Model predictions
- Outlier detection results (
is_outlierscores) - Drift detection metrics (after batch size is reached)
Evidently for Drift Detection
Evidently provides an easy way to generate drift reports:- Feature-by-feature drift scores
- Statistical tests (KS, Chi-squared, etc.)
- Distribution visualizations
- Data quality metrics (missing values, duplicates, etc.)
Drift Detection in Pipelines
Integrate Evidently into your ML pipeline:Monitoring Strategy
Design a comprehensive monitoring plan:1. Define Metrics
Input Monitoring
Input Monitoring
- Feature distributions
- Missing value rates
- Outlier frequency
- Data schema compliance
Output Monitoring
Output Monitoring
- Prediction distributions
- Confidence scores
- Class balance (for classification)
- Output range (for regression)
Performance Monitoring
Performance Monitoring
- Accuracy, precision, recall (when ground truth available)
- Prediction-outcome correlation
- Business metrics (conversion rate, revenue, etc.)
System Monitoring
System Monitoring
- Latency (p50, p95, p99)
- Throughput (requests per second)
- Error rates
- Resource usage
2. Ground Truth Collection
Ground truth is essential for measuring actual performance:- Delayed feedback: Collect outcomes days or weeks later
- User feedback: Thumbs up/down, corrections
- A/B testing: Compare model variants
- Manual labeling: Sample and label production data
- Proxy metrics: Use correlated signals when direct labels unavailable
3. Alerting Thresholds
Define thresholds for alerts:4. Remediation Actions
Remediate
- Retrain model on recent data
- Roll back to previous version
- Apply hotfix or feature engineering
- Adjust thresholds or business logic
Best Practices
Monitor Continuously
Don’t wait for complaints. Set up automated monitoring to catch issues early.
Start Simple
Begin with basic metrics (input distributions, latency, errors) before adding complex drift detection.
Use Multiple Methods
Combine statistical tests, business metrics, and manual review for comprehensive monitoring.
Close the Loop
Feed production data back into training to keep models up-to-date.
Additional Resources
- Evidently Documentation
- Seldon Core v2 Docs
- Alibi Detect Examples
- Data Distribution Shifts and Monitoring (Chip Huyen)
- ML Observability Course (Evidently)
- Monitoring Machine Learning Systems (Goku Mohandas)
Next Steps
Practice Tasks
Complete the homework assignments to apply these monitoring concepts