Documentation Index
Fetch the complete documentation index at: https://mintlify.com/RaviTejaMedarametla/Data-Science-AI-Portfolio/llms.txt
Use this file to discover all available pages before exploring further.
Overview
After model selection, the best model is evaluated on the test set with calibrated decision thresholds to meet business requirements. The evaluation focuses on precision-recall tradeoffs and ROC AUC performance.Evaluation Metrics
Four primary metrics are computed on the test set:1. ROC AUC (Area Under ROC Curve)
Measures overall classification performance across all thresholds. Range: 0.0 to 1.0 (higher is better) Interpretation:- 0.5: Random guessing
- 0.7-0.8: Acceptable performance
- 0.8-0.9: Good performance
- 0.9+: Excellent performance
src/train.py:193
2. Precision
Proportion of predicted purchases that were actual purchases. Formula:TP / (TP + FP)
Why it matters: High precision means fewer false alarms (users predicted to purchase who don’t)
Implementation: src/train.py:194
3. Recall
Proportion of actual purchases that were correctly predicted. Formula:TP / (TP + FN)
Why it matters: High recall means capturing most potential purchasers
Implementation: src/train.py:195
4. F1 Score
Harmonic mean of precision and recall. Formula:2 × (precision × recall) / (precision + recall)
Why it matters: Balances precision and recall into single metric
Implementation: src/train.py:196
Precision-Recall Threshold Calibration
The default 0.5 threshold is replaced with a calibrated threshold to meet business precision targets. Implementation:src/train.py:159-170
Threshold Selection Algorithm
- Generate precision-recall curve at all possible thresholds
- Filter candidates where precision ≥ target_precision
- Select threshold with maximum recall among candidates
- Fallback: If no candidates, use threshold with highest precision
Business Configuration
Target precision is configured inconfig.yaml:
Why Calibrate Thresholds?
Default 0.5 threshold may not align with business goals:- Marketing campaigns: High precision reduces wasted ad spend
- Sales outreach: Focus efforts on likely purchasers
- User experience: Avoid over-targeting uninterested users
Metrics Output Format
All metrics are saved tometrics.json:
Implementation: src/train.py:188-199
Metrics Structure
| Field | Type | Description |
|---|---|---|
run_id | string | Unique identifier for training run |
best_model_name | string | Name of selected model |
calibration | object | Threshold calibration details |
calibration.type | string | Calibration method (“threshold”) |
calibration.target_precision | float | Target precision from config |
calibration.threshold | float | Selected decision threshold |
accuracy | float | Overall classification accuracy |
roc_auc | float | ROC AUC score |
precision | float | Precision at calibrated threshold |
recall | float | Recall at calibrated threshold |
f1 | float | F1 score at calibrated threshold |
cv_ranking | array | Cross-validation results for all models |
Artifacts Configuration
Output locations are configured inconfig.yaml:
Generated Files
- best_model.joblib: Trained scikit-learn pipeline
- threshold.txt: Calibrated threshold value (plain text)
- metrics.json: Complete evaluation metrics (JSON)
- drift_baseline.json: Training data statistics for drift detection
- lineage.json: Data and model provenance information
Model Persistence
The best model and threshold are saved for production use: Implementation:src/train.py:172-181
Lineage Tracking
Data and model lineage is tracked with SHA256 hashes: Implementation:src/train.py:201-220
Lineage Benefits
- Reproducibility: Track exact data and config versions
- Auditability: Verify model provenance
- Debugging: Identify which data produced which model
- Rollback: Match models to their training data
Usage Example
Complete Training Flow
The full training process (src/train.py:116-227):
- Load configuration and data
- Split into train/test sets
- Build preprocessor and models
- Cross-validate all models
- Select best model by ROC AUC
- Generate precision-recall curve
- Calibrate threshold to target precision
- Evaluate on test set
- Save model, threshold, and metrics
- Track lineage with hashes
Next Steps
Model Selection
Learn about cross-validation and model comparison
Data Loading
Understand the data pipeline