Skip to main content

Overview

This guide provides comprehensive installation instructions for the Fraud Detection ML System. Follow these steps to set up your development environment and get the system running on your machine.
Installation typically takes 10-15 minutes depending on your internet connection and system specifications.

System Requirements

Hardware Requirements

ComponentMinimumRecommended
CPU2 cores4+ cores
RAM4 GB8+ GB
Storage2 GB free5+ GB free
OSLinux, macOS, Windows 10+Linux (Ubuntu 18.04+)

Software Prerequisites

Before installing, ensure you have the following software:
The system is tested with Python 3.8. While newer versions may work, we recommend Python 3.8 for compatibility.Check your Python version:
python --version
# or
python3 --version
Expected output:
Python 3.8.x
If you don’t have Python 3.8, we recommend installing it via Miniconda (see next section).
Git is required to clone the repository.Install Git:
sudo apt-get update
sudo apt-get install git
Verify installation:
git --version

Installation Steps

1

Clone the Repository

First, clone the fraud detection repository to your local machine:
git clone <repository-url>
cd source
Verify directory structure:
ls -la
You should see directories like:
  • Training_Batch_Files/
  • Prediction_Batch_files/
  • application_logging/
  • Files: main.py, trainingModel.py, requirements.txt
2

Create Conda Environment

Create a new conda environment named fraud-detection with Python 3.8:
conda create -n fraud-detection python=3.8 -y
This command:
  • Creates an isolated environment named fraud-detection
  • Installs Python 3.8
  • -y flag automatically confirms the installation
Expected output:
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/user/miniconda3/envs/fraud-detection
  
  added / updated specs:
    - python=3.8

...

# To activate this environment, use
#
#     $ conda activate fraud-detection
3

Activate the Environment

Activate the newly created conda environment:
conda activate fraud-detection
Your terminal prompt should change to show (fraud-detection) at the beginning:
(fraud-detection) user@machine:~/source$
You must activate this environment every time you open a new terminal session before working with the fraud detection system.
Tip: To automatically activate the environment, add this to your .bashrc or .zshrc:
# Auto-activate fraud-detection environment
conda activate fraud-detection
4

Install Python Dependencies

Install all required Python packages from requirements.txt:
pip install -r requirements.txt
This installs 44 packages totaling approximately 500MB. The installation typically takes 3-5 minutes.
The following packages will be installed:
APScheduler==3.6.3
attrs==19.3.0
certifi==2019.11.28
Click==7.0
colorhash==1.0.2
configparser==4.0.2
cycler==0.10.0
Flask==1.1.1
Flask-Cors==3.0.8
Flask-MonitoringDashboard==3.0.6
gunicorn==20.0.4
imbalanced-learn==0.6.1
imblearn==0.0
importlib-metadata==1.4.0
itsdangerous==1.1.0
Jinja2==2.11.0
joblib==0.14.1
jsonschema==3.2.0
kiwisolver==1.1.0
kneed==0.5.1
MarkupSafe==1.1.1
matplotlib==3.1.2
more-itertools==8.1.0
numpy==1.18.1
pandas==0.25.3
psutil==5.6.7
pyparsing==2.4.6
pyrsistent==0.15.7
python-dateutil==2.8.1
pytz==2019.3
PyYAML==5.3
regexp==0.1
scikit-learn==0.22.1
scipy==1.4.1
six==1.14.0
sklearn==0.0
sklearn-pandas==1.8.0
SQLAlchemy==1.3.13
tzlocal==2.0.0
Werkzeug==0.16.1
wincertstore==0.2
xgboost==0.90
zipp==2.0.1

Key Dependencies Explained

PackageVersionPurpose
Flask1.1.1Web framework for REST API
Flask-Cors3.0.8Enable CORS for API access
Flask-MonitoringDashboard3.0.6Monitor API performance and usage
scikit-learn0.22.1Machine learning algorithms (RandomForest, etc.)
xgboost0.90Gradient boosting classifier
imbalanced-learn0.6.1Handle imbalanced fraud detection datasets
pandas0.25.3Data manipulation and CSV processing
numpy1.18.1Numerical computing
matplotlib3.1.2Plotting for KMeans elbow curves
kneed0.5.1Automatic elbow detection in clustering
SQLAlchemy1.3.13Database operations
gunicorn20.0.4Production WSGI server
5

Verify Installation

Verify that all critical packages are installed correctly:
python -c "import flask, sklearn, xgboost, pandas, numpy, imblearn; print('✓ All packages installed successfully!')"
Expected output:
✓ All packages installed successfully!
Check package versions:
python -c "import flask, sklearn, xgboost; print(f'Flask: {flask.__version__}\nScikit-learn: {sklearn.__version__}\nXGBoost: {xgboost.__version__}')"
Expected output:
Flask: 1.1.1
Scikit-learn: 0.22.1
XGBoost: 0.90
6

Initialize Required Directories

The system requires several directories for logs and data. Most are included in the repository, but verify they exist:
# Check for required directories
ls -d Training_Batch_Files/ Prediction_Batch_files/ \
      Prediction_Output_File/ Training_Logs/ Prediction_Logs/ 2>/dev/null
If any directories are missing, create them:
mkdir -p Training_Batch_Files Prediction_Batch_files \
         Prediction_Output_File Training_Logs Prediction_Logs
7

Test the Installation

Start the Flask application to verify everything works:
python main.py
Expected output:
 * Serving Flask app "main" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
 * Running on http://127.0.0.1:5001/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: xxx-xxx-xxx
Test the API:Open a new terminal and run:
curl http://localhost:5001/
You should receive HTML content from the index page.Press Ctrl+C in the first terminal to stop the server.

Platform-Specific Notes

Linux Installation Notes

Linux is the recommended platform for production deployments.Additional dependencies for Ubuntu/Debian:
sudo apt-get update
sudo apt-get install -y build-essential python3-dev
For XGBoost compilation (if needed):
sudo apt-get install -y cmake libboost-all-dev
Running as a service:Create /etc/systemd/system/fraud-detection.service:
[Unit]
Description=Fraud Detection ML System
After=network.target

[Service]
Type=simple
User=your-user
WorkingDirectory=/path/to/source
Environment="PATH=/home/your-user/miniconda3/envs/fraud-detection/bin"
ExecStart=/home/your-user/miniconda3/envs/fraud-detection/bin/python main.py
Restart=on-failure

[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl enable fraud-detection
sudo systemctl start fraud-detection

Troubleshooting Common Issues

Problem: Pip is not installed or not in PATH.Solution:
# Reinstall pip
conda install pip

# or use conda's pip explicitly
/path/to/conda/envs/fraud-detection/bin/pip install -r requirements.txt
Problem: XGBoost compilation fails on your platform.Solution 1 - Use conda:
conda install -c conda-forge xgboost=0.90
Solution 2 - Install build tools:
sudo apt-get install build-essential cmake
pip install xgboost==0.90
Problem: Missing build dependencies for scikit-learn.Solution:
# Install via conda instead
conda install scikit-learn=0.22.1

# Then install remaining requirements
pip install -r requirements.txt
Problem: Wrong Python interpreter or environment not activated.Solution:
# Ensure environment is activated
conda activate fraud-detection

# Verify which Python is being used
which python
# Should show: /path/to/miniconda3/envs/fraud-detection/bin/python

# Reinstall Flask if needed
pip install Flask==1.1.1
Problem: Another application is using port 5001.Solution 1 - Use different port:
export PORT=5002
python main.py
Solution 2 - Find and kill process:
# Find process using port 5001
lsof -i :5001

# Kill the process
kill -9 <PID>
Problem: Certificate verification fails during package installation.Solution:
# Update certificates
conda update certifi

# or install with pip
pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org -r requirements.txt
Problem: Insufficient permissions to create directories or write files.Solution:Linux/macOS:
# Fix directory permissions
chmod -R u+w .

# Don't use sudo with conda/pip
Windows:
# Run Anaconda Prompt as Administrator
# or adjust folder permissions in Properties
Problem: Insufficient RAM for large datasets.Solution:
  1. Reduce batch size - Process smaller chunks of data
  2. Increase swap space (Linux):
    sudo fallocate -l 4G /swapfile
    sudo chmod 600 /swapfile
    sudo mkswap /swapfile
    sudo swapon /swapfile
    
  3. Use a machine with more RAM
  4. Optimize data preprocessing - Remove unnecessary columns earlier

Production Deployment

The Flask development server (python main.py) is not suitable for production use. Use a production WSGI server instead.

Using Gunicorn (Linux/macOS)

Gunicorn is already included in requirements.txt.
# Run with 4 worker processes
gunicorn -w 4 -b 0.0.0.0:5001 main:app

# With automatic restart on code changes
gunicorn -w 4 -b 0.0.0.0:5001 --reload main:app

# Production configuration
gunicorn -w 4 \
  -b 0.0.0.0:5001 \
  --timeout 300 \
  --access-logfile logs/access.log \
  --error-logfile logs/error.log \
  main:app

Using Waitress (Windows)

pip install waitress
waitress-serve --port=5001 --threads=4 main:app

Environment Variables

Set these environment variables for production:
export FLASK_ENV=production
export PORT=5001
export WORKERS=4

Verification Checklist

Before proceeding, verify:
  • Python 3.8 is installed and active
  • Conda environment fraud-detection is created and activated
  • All 44 packages from requirements.txt are installed
  • python -c "import flask, sklearn, xgboost" runs without errors
  • python main.py starts the Flask server successfully
  • Required directories exist (Training_Batch_Files, Prediction_Batch_files, etc.)
  • Port 5001 is accessible
  • You can access http://localhost:5001/ in a browser
Once all checklist items are verified, proceed to the Quickstart Guide to train your first model and make predictions.

Uninstallation

To completely remove the fraud detection system:
# Deactivate environment
conda deactivate

# Remove conda environment
conda env remove -n fraud-detection

# Remove source code
rm -rf /path/to/source

# Optional: Remove Miniconda
rm -rf ~/miniconda3

Getting Help

If you encounter issues not covered in this guide:
  1. Check logs - Review Training_Logs/ and Prediction_Logs/ for detailed error messages
  2. Verify data format - Ensure your CSV files match the schema in schema_training.json and schema_prediction.json
  3. Test with sample data - Use the provided sample files to isolate the issue
  4. Review dependencies - Run pip list to check installed package versions

Quickstart Guide

Ready to start? Follow the quickstart guide to train your first fraud detection model

Build docs developers (and LLMs) love