Fraud Detection System

End-to-end machine learning system for detecting fraudulent insurance claims using Flask, XGBoost, and K-Means clustering

Get Started API Reference

Quick Start

Get up and running with the fraud detection system in minutes

Clone the repository

Clone the fraud detection project to your local machine:

git clone https://github.com/sujith52/fraud.git
cd fraud

Set up environment

Create a conda environment with Python 3.8 and install dependencies:

conda create -n fraud-env python=3.8 -y
conda activate fraud-env
pip install -r requirements.txt

The project uses specific library versions (Flask 1.1.1, scikit-learn 0.22.1, XGBoost 0.90) for compatibility.

Start the Flask application

Run the application locally:

python main.py

The server will start at http://127.0.0.1:5001

Make your first prediction

Upload a CSV file with insurance claims data through the web interface or use the API:

curl -X POST http://127.0.0.1:5001/predict \
  -H "Content-Type: application/json" \
  -d '{"filepath": "Prediction_Batch_files/"}'

Example Response

Prediction File created at Prediction_Output_File/Predictions.csv!!!

Key Features

Everything you need for production-ready fraud detection

Multi-Model Detection

Automatically selects the best model between XGBoost and SVM using cross-validation and AUC scoring

K-Means Clustering

Segments data into clusters using the elbow method for optimized model training per cluster

Data Validation

Validates schema, file naming conventions, and data types before processing

Batch Processing

Process multiple insurance claims in batch mode with CSV output

Flask API

RESTful API endpoints for training models and generating predictions

Monitoring Dashboard

Built-in Flask monitoring dashboard for tracking API performance

Explore by Topic

Deep dive into specific areas of the system

System Architecture

Understand the ML pipeline from data ingestion to prediction serving

Learn more

Data Preprocessing

Feature engineering, encoding, scaling, and handling missing values

Learn more

Model Selection

Hyperparameter tuning with GridSearchCV for XGBoost and SVM

Learn more

Production Deployment

Deploy to Heroku or your own infrastructure with Gunicorn

Learn more

Ready to detect fraud?

Follow the quickstart guide to set up the system and start detecting fraudulent insurance claims in minutes.

Get Started

Get Started

Core Concepts

Training

Prediction

Fraud Detection System

Quick Start

Key Features

Multi-Model Detection

K-Means Clustering

Data Validation

Batch Processing

Flask API

Monitoring Dashboard

Explore by Topic

System Architecture

Data Preprocessing

Model Selection

Production Deployment

Ready to detect fraud?

Build docs developers (and LLMs) love

Get Started

Core Concepts

Training

Prediction

Documentation Index

Fraud Detection System

Quick Start

Key Features

Multi-Model Detection

K-Means Clustering

Data Validation

Batch Processing

Flask API

Monitoring Dashboard

Explore by Topic

System Architecture

Data Preprocessing

Model Selection

Production Deployment

Ready to detect fraud?

Build docs developers (and LLMs) love