Prerequisites
Before you begin, ensure you have the following installed:- Python 3.8 - The project requires Python 3.8 for compatibility with specific library versions
- Miniconda or Anaconda - For creating isolated Python environments
- Git - For cloning the repository
Setup Steps
Create conda environment
Create a new conda environment with Python 3.8:
Using a dedicated environment ensures dependency isolation and prevents conflicts with other Python projects.
Install dependencies
Install all required packages from requirements.txt:This will install:
- Flask 1.1.1 (web framework)
- scikit-learn 0.22.1 (ML algorithms)
- XGBoost 0.90 (gradient boosting)
- pandas 0.25.3 (data processing)
- Flask-MonitoringDashboard 3.0.6 (monitoring)
- And 40+ other dependencies
Train a Model
Before making predictions, you need to train a model on your insurance claims data.Using the API
Send a POST request to the/train endpoint:
Training may take several minutes depending on the dataset size. The system will:
- Validate and preprocess the data
- Perform K-Means clustering
- Train XGBoost and SVM models per cluster
- Select and save the best model for each cluster
Make a Prediction
Once models are trained, you can predict fraud on new insurance claims.Using the API
Send a POST request to the/predict endpoint:
Response
View Results
The prediction results will be saved as a CSV file atPrediction_Output_File/Predictions.csv:
- Y = Fraud detected
- N = No fraud detected
Access Monitoring Dashboard
View API performance metrics and usage statistics:- Request/response times
- Endpoint usage statistics
- Error rates
- Performance graphs
The monitoring dashboard uses Flask-MonitoringDashboard and stores metrics in
flask_monitoringdashboard.db.Next Steps
Installation Guide
Detailed installation instructions and troubleshooting
Training Guide
Learn about the model training pipeline
API Reference
Complete API documentation
Deployment
Deploy to production with Gunicorn
Troubleshooting
Port 5001 is already in use
Port 5001 is already in use
Change the port by setting the
PORT environment variable:ModuleNotFoundError when running
ModuleNotFoundError when running
Ensure you’ve activated the conda environment and installed all dependencies:
Training data not found
Training data not found
Ensure your training data CSV files are in the
Training_Batch_Files/ directory with the correct naming format: fraudDetection_[DATESTAMP]_[TIMESTAMP].csv