Overview
This page presents the actual evaluation results from the fraud detection model on the validation set. All metrics are extracted directly from the notebook output.Evaluation Code
Confusion Matrix
The confusion matrix shows actual vs. predicted classifications:| Predicted: Legitimate | Predicted: Fraud | |
|---|---|---|
| Actual: Legitimate | 436 (True Negatives) | 11 (False Positives) |
| Actual: Fraud | 17 (False Negatives) | 129 (True Positives) |
- True Negatives (436): Legitimate transactions correctly identified
- True Positives (129): Fraudulent transactions correctly identified
- False Positives (11): Legitimate transactions incorrectly flagged as fraud
- False Negatives (17): Fraudulent transactions missed by the model
- The model correctly identifies most transactions (436 + 129 = 565 out of 593)
- Only 11 false alarms (legitimate transactions flagged as fraud)
- Only 17 fraud cases missed (11.6% of all fraud)
Classification Report
Detailed per-class metrics:Metrics Explanation
Class 0 (Legitimate Transactions)
-
Precision: 0.96 (96%)
- Of all transactions predicted as legitimate, 96% actually are legitimate
- Formula: 436 / (436 + 17) = 0.96
- Low false positive rate
-
Recall: 0.98 (98%)
- Of all actual legitimate transactions, 98% are correctly identified
- Formula: 436 / (436 + 11) = 0.98
- Excellent at finding legitimate transactions
-
F1-Score: 0.97 (97%)
- Harmonic mean of precision and recall
- Balanced performance on legitimate class
-
Support: 447
- Number of legitimate transactions in validation set
Class 1 (Fraudulent Transactions)
-
Precision: 0.92 (92%)
- Of all transactions predicted as fraud, 92% actually are fraud
- Formula: 129 / (129 + 11) = 0.92
- When the model flags fraud, it’s usually correct
-
Recall: 0.88 (88%)
- Of all actual fraud cases, 88% are correctly detected
- Formula: 129 / (129 + 17) = 0.88
- Catches most fraud, but misses 12%
-
F1-Score: 0.90 (90%)
- Harmonic mean of precision and recall
- Strong balanced performance on fraud class
-
Support: 146
- Number of fraudulent transactions in validation set
Overall Metrics
-
Accuracy: 0.95 (95%)
- Overall percentage of correct predictions
- Formula: (436 + 129) / 593 = 0.95
- 565 out of 593 transactions correctly classified
-
Macro Average: 0.94
- Simple average of metrics across both classes
- Treats both classes equally (regardless of support)
-
Weighted Average: 0.95
- Average weighted by support (number of samples per class)
- More representative for imbalanced datasets
AUC-ROC Score
What is AUC-ROC?
The Area Under the Receiver Operating Characteristic curve measures the model’s ability to distinguish between classes:- 1.0: Perfect classifier
- 0.5: Random guessing
- < 0.5: Worse than random
- Near-perfect discrimination between fraud and legitimate transactions
- The model’s probability scores are highly informative
- 98.8% chance that a randomly chosen fraud case will have a higher predicted probability than a randomly chosen legitimate case
Why AUC-ROC Matters for Fraud Detection
- Threshold-independent: Evaluates model quality regardless of classification threshold
- Imbalance-robust: Works well with imbalanced datasets
- Probability calibration: Indicates how well probabilities reflect true risk
- Business flexibility: Allows adjusting thresholds based on cost of false positives vs. false negatives
Performance Summary
| Metric | Value | Interpretation |
|---|---|---|
| Accuracy | 95% | Excellent overall performance |
| Precision (Fraud) | 92% | When flagged as fraud, usually correct |
| Recall (Fraud) | 88% | Catches most fraud cases |
| F1-Score (Fraud) | 90% | Strong balanced performance |
| AUC-ROC | 98.8% | Near-perfect discrimination |
| False Positive Rate | 2.5% | 11 out of 447 legitimate transactions |
| False Negative Rate | 11.6% | 17 out of 146 fraud cases missed |
Model Strengths
- High accuracy (95%): Correctly classifies most transactions
- Excellent AUC-ROC (98.8%): Near-perfect separation between classes
- Strong precision on fraud (92%): Low false alarm rate
- Good recall on fraud (88%): Catches majority of fraud cases
- Balanced performance: Both classes perform well (F1 scores: 0.97 and 0.90)
Areas for Improvement
-
Fraud recall (88%):
- 17 fraud cases missed
- Could lower threshold to catch more fraud (at cost of more false positives)
- Consider additional features or data sources
-
False negatives in fraud:
- Missing 12% of fraud cases could be costly
- May need specialized techniques for rare fraud patterns
- Could implement anomaly detection as complementary approach
Business Impact
In a fraud detection context: Costs:- False Positive: Manual review cost, potential customer friction
- False Negative: Financial loss from undetected fraud
- 11 false positives: 11 legitimate transactions require manual review
- 17 false negatives: 17 fraud cases slip through (potential losses)
- Lower threshold (e.g., 0.3): Catch more fraud (↑ recall) but more false alarms (↓ precision)
- Higher threshold (e.g., 0.7): Fewer false alarms (↑ precision) but miss more fraud (↓ recall)
Validation Set Composition
- Total samples: 593
- Legitimate transactions: 447 (75.4%)
- Fraudulent transactions: 146 (24.6%)
- Class imbalance ratio: ~3:1 (legitimate:fraud)
Conclusion
The fraud detection model demonstrates excellent performance with:- 95% accuracy
- 98.8% AUC-ROC score
- Strong performance on both classes
Next Steps
For implementation details:- Model Overview - Architecture and configuration
- Data Preprocessing - Data preparation steps
- Model Training - Training process and strategy