Problem Description
This project implements a fraud detection model to identify fraudulent transactions in a dataset. The model is trained to classify transactions as either fraudulent (FRAUDE=1) or legitimate (FRAUDE=0).Model Architecture
The model uses a RandomForestClassifier with the following configuration:- Algorithm: Random Forest (ensemble of decision trees)
- Number of estimators: 200 trees
- Class weight: Balanced (to handle class imbalance)
- Random state: 42 (for reproducibility)
Key Features
Handling Class Imbalance
The dataset exhibits class imbalance between fraudulent and legitimate transactions. To address this:class_weight='balanced'automatically adjusts weights inversely proportional to class frequencies- This ensures the model doesn’t bias toward the majority class
Model Performance
On the validation set, the model achieves:- Accuracy: 95%
- AUC-ROC Score: 0.988 (98.8%)
Model Initialization Code
Why Random Forest?
Random Forest was chosen for this fraud detection task because:- Robustness: Handles both numerical and categorical features well
- Non-linearity: Captures complex patterns in transaction data
- Overfitting resistance: Ensemble approach reduces variance
- Feature importance: Provides insights into which features drive predictions
- Class imbalance handling: Built-in support through
class_weightparameter