Installation Guide

Overview

This guide provides comprehensive installation instructions for the Fraud Detection ML System. Follow these steps to set up your development environment and get the system running on your machine.

Installation typically takes 10-15 minutes depending on your internet connection and system specifications.

System Requirements

Hardware Requirements

Component	Minimum	Recommended
CPU	2 cores	4+ cores
RAM	4 GB	8+ GB
Storage	2 GB free	5+ GB free
OS	Linux, macOS, Windows 10+	Linux (Ubuntu 18.04+)

Software Prerequisites

Before installing, ensure you have the following software:

Python 3.8 (Required)

The system is tested with Python 3.8. While newer versions may work, we recommend Python 3.8 for compatibility.Check your Python version:

python --version
# or
python3 --version

Expected output:

Python 3.8.x

If you don’t have Python 3.8, we recommend installing it via Miniconda (see next section).

Miniconda or Anaconda (Recommended)

Conda provides isolated environments and simplified package management, especially for scientific Python packages.Install Miniconda:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Verify installation:

conda --version

Expected output:

conda 23.x.x

Git (Required)

Git is required to clone the repository.Install Git:

sudo apt-get update
sudo apt-get install git

Verify installation:

git --version

Installation Steps

Clone the Repository

First, clone the fraud detection repository to your local machine:

git clone <repository-url>
cd source

Verify directory structure:

ls -la

You should see directories like:

Training_Batch_Files/
Prediction_Batch_files/
application_logging/
Files: main.py, trainingModel.py, requirements.txt

Create Conda Environment

Create a new conda environment named fraud-detection with Python 3.8:

conda create -n fraud-detection python=3.8 -y

This command:

Creates an isolated environment named fraud-detection
Installs Python 3.8
-y flag automatically confirms the installation

Expected output:

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/user/miniconda3/envs/fraud-detection
  
  added / updated specs:
    - python=3.8

...

# To activate this environment, use
#
#     $ conda activate fraud-detection

Activate the Environment

Activate the newly created conda environment:

conda activate fraud-detection

Your terminal prompt should change to show (fraud-detection) at the beginning:

(fraud-detection) user@machine:~/source$

You must activate this environment every time you open a new terminal session before working with the fraud detection system.

Tip: To automatically activate the environment, add this to your .bashrc or .zshrc:

# Auto-activate fraud-detection environment
conda activate fraud-detection

Install Python Dependencies

Install all required Python packages from requirements.txt:

pip install -r requirements.txt

This installs 44 packages totaling approximately 500MB. The installation typically takes 3-5 minutes.

View Complete Dependency List

The following packages will be installed:

APScheduler==3.6.3
attrs==19.3.0
certifi==2019.11.28
Click==7.0
colorhash==1.0.2
configparser==4.0.2
cycler==0.10.0
Flask==1.1.1
Flask-Cors==3.0.8
Flask-MonitoringDashboard==3.0.6
gunicorn==20.0.4
imbalanced-learn==0.6.1
imblearn==0.0
importlib-metadata==1.4.0
itsdangerous==1.1.0
Jinja2==2.11.0
joblib==0.14.1
jsonschema==3.2.0
kiwisolver==1.1.0
kneed==0.5.1
MarkupSafe==1.1.1
matplotlib==3.1.2
more-itertools==8.1.0
numpy==1.18.1
pandas==0.25.3
psutil==5.6.7
pyparsing==2.4.6
pyrsistent==0.15.7
python-dateutil==2.8.1
pytz==2019.3
PyYAML==5.3
regexp==0.1
scikit-learn==0.22.1
scipy==1.4.1
six==1.14.0
sklearn==0.0
sklearn-pandas==1.8.0
SQLAlchemy==1.3.13
tzlocal==2.0.0
Werkzeug==0.16.1
wincertstore==0.2
xgboost==0.90
zipp==2.0.1

Key Dependencies Explained

Package	Version	Purpose
Flask	1.1.1	Web framework for REST API
Flask-Cors	3.0.8	Enable CORS for API access
Flask-MonitoringDashboard	3.0.6	Monitor API performance and usage
scikit-learn	0.22.1	Machine learning algorithms (RandomForest, etc.)
xgboost	0.90	Gradient boosting classifier
imbalanced-learn	0.6.1	Handle imbalanced fraud detection datasets
pandas	0.25.3	Data manipulation and CSV processing
numpy	1.18.1	Numerical computing
matplotlib	3.1.2	Plotting for KMeans elbow curves
kneed	0.5.1	Automatic elbow detection in clustering
SQLAlchemy	1.3.13	Database operations
gunicorn	20.0.4	Production WSGI server

Verify Installation

Verify that all critical packages are installed correctly:

python -c "import flask, sklearn, xgboost, pandas, numpy, imblearn; print('✓ All packages installed successfully!')"

Expected output:

✓ All packages installed successfully!

Check package versions:

python -c "import flask, sklearn, xgboost; print(f'Flask: {flask.__version__}\nScikit-learn: {sklearn.__version__}\nXGBoost: {xgboost.__version__}')"

Expected output:

Flask: 1.1.1
Scikit-learn: 0.22.1
XGBoost: 0.90

Initialize Required Directories

The system requires several directories for logs and data. Most are included in the repository, but verify they exist:

# Check for required directories
ls -d Training_Batch_Files/ Prediction_Batch_files/ \
      Prediction_Output_File/ Training_Logs/ Prediction_Logs/ 2>/dev/null

If any directories are missing, create them:

mkdir -p Training_Batch_Files Prediction_Batch_files \
         Prediction_Output_File Training_Logs Prediction_Logs

Test the Installation

Start the Flask application to verify everything works:

python main.py

Expected output:

 * Serving Flask app "main" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
 * Running on http://127.0.0.1:5001/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: xxx-xxx-xxx

Test the API:Open a new terminal and run:

curl http://localhost:5001/

You should receive HTML content from the index page.Press Ctrl+C in the first terminal to stop the server.

Platform-Specific Notes

Linux
macOS
Windows

Linux Installation Notes

Linux is the recommended platform for production deployments.Additional dependencies for Ubuntu/Debian:

sudo apt-get update
sudo apt-get install -y build-essential python3-dev

For XGBoost compilation (if needed):

sudo apt-get install -y cmake libboost-all-dev

Running as a service:Create /etc/systemd/system/fraud-detection.service:

[Unit]
Description=Fraud Detection ML System
After=network.target

[Service]
Type=simple
User=your-user
WorkingDirectory=/path/to/source
Environment="PATH=/home/your-user/miniconda3/envs/fraud-detection/bin"
ExecStart=/home/your-user/miniconda3/envs/fraud-detection/bin/python main.py
Restart=on-failure

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl enable fraud-detection
sudo systemctl start fraud-detection

macOS Installation Notes

Install Xcode Command Line Tools:

xcode-select --install

Using Homebrew (recommended):

brew install python@3.8 git

M1/M2 Mac considerations:If you’re on Apple Silicon, some packages may require Rosetta 2:

# Install Rosetta 2
softwareupdate --install-rosetta

# Create environment with x86_64 architecture
CONDA_SUBDIR=osx-64 conda create -n fraud-detection python=3.8 -y
conda activate fraud-detection
conda config --env --set subdir osx-64

Windows Installation Notes

Use Anaconda Prompt or PowerShell:All commands should be run in Anaconda Prompt (installed with Miniconda) or PowerShell.Microsoft Visual C++ 14.0:Some packages require Visual C++ build tools. Download from: https://visualstudio.microsoft.com/visual-cpp-build-tools/Path separator differences:In Python code, use forward slashes or raw strings:

# Good
path = "Training_Batch_Files/"
# or
path = r"Training_Batch_Files\\"

Running in production:For production on Windows, use waitress instead of gunicorn:

pip install waitress
waitress-serve --port=5001 main:app

Troubleshooting Common Issues

Error: 'pip' command not found

Problem: Pip is not installed or not in PATH.Solution:

# Reinstall pip
conda install pip

# or use conda's pip explicitly
/path/to/conda/envs/fraud-detection/bin/pip install -r requirements.txt

Error installing XGBoost

Problem: XGBoost compilation fails on your platform.Solution 1 - Use conda:

conda install -c conda-forge xgboost=0.90

Solution 2 - Install build tools:

sudo apt-get install build-essential cmake
pip install xgboost==0.90

Error: Failed building wheel for scikit-learn

Problem: Missing build dependencies for scikit-learn.Solution:

# Install via conda instead
conda install scikit-learn=0.22.1

# Then install remaining requirements
pip install -r requirements.txt

ImportError: No module named 'flask'

Problem: Wrong Python interpreter or environment not activated.Solution:

# Ensure environment is activated
conda activate fraud-detection

# Verify which Python is being used
which python
# Should show: /path/to/miniconda3/envs/fraud-detection/bin/python

# Reinstall Flask if needed
pip install Flask==1.1.1

Port 5001 already in use

Problem: Another application is using port 5001.Solution 1 - Use different port:

export PORT=5002
python main.py

Solution 2 - Find and kill process:

# Find process using port 5001
lsof -i :5001

# Kill the process
kill -9 <PID>

SSL Certificate errors

Problem: Certificate verification fails during package installation.Solution:

# Update certificates
conda update certifi

# or install with pip
pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org -r requirements.txt

Permission denied errors

Problem: Insufficient permissions to create directories or write files.Solution:Linux/macOS:

# Fix directory permissions
chmod -R u+w .

# Don't use sudo with conda/pip

Windows:

# Run Anaconda Prompt as Administrator
# or adjust folder permissions in Properties

Memory errors during training

Problem: Insufficient RAM for large datasets.Solution:

Reduce batch size - Process smaller chunks of data

Increase swap space (Linux):

sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Use a machine with more RAM
Optimize data preprocessing - Remove unnecessary columns earlier

Production Deployment

The Flask development server (python main.py) is not suitable for production use. Use a production WSGI server instead.

Using Gunicorn (Linux/macOS)

Gunicorn is already included in requirements.txt.

# Run with 4 worker processes
gunicorn -w 4 -b 0.0.0.0:5001 main:app

# With automatic restart on code changes
gunicorn -w 4 -b 0.0.0.0:5001 --reload main:app

# Production configuration
gunicorn -w 4 \
  -b 0.0.0.0:5001 \
  --timeout 300 \
  --access-logfile logs/access.log \
  --error-logfile logs/error.log \
  main:app

Using Waitress (Windows)

pip install waitress
waitress-serve --port=5001 --threads=4 main:app

Environment Variables

Set these environment variables for production:

export FLASK_ENV=production
export PORT=5001
export WORKERS=4

Verification Checklist

Before proceeding, verify:

Python 3.8 is installed and active
Conda environment fraud-detection is created and activated
All 44 packages from requirements.txt are installed
python -c "import flask, sklearn, xgboost" runs without errors
python main.py starts the Flask server successfully
Required directories exist (Training_Batch_Files, Prediction_Batch_files, etc.)
Port 5001 is accessible
You can access http://localhost:5001/ in a browser

Once all checklist items are verified, proceed to the Quickstart Guide to train your first model and make predictions.

Uninstallation

To completely remove the fraud detection system:

# Deactivate environment
conda deactivate

# Remove conda environment
conda env remove -n fraud-detection

# Remove source code
rm -rf /path/to/source

# Optional: Remove Miniconda
rm -rf ~/miniconda3

Getting Help

If you encounter issues not covered in this guide:

Check logs - Review Training_Logs/ and Prediction_Logs/ for detailed error messages
Verify data format - Ensure your CSV files match the schema in schema_training.json and schema_prediction.json
Test with sample data - Use the provided sample files to isolate the issue
Review dependencies - Run pip list to check installed package versions

Quickstart Guide

Ready to start? Follow the quickstart guide to train your first fraud detection model

Get Started

Core Concepts

Training

Prediction

Overview

System Requirements

Hardware Requirements

Software Prerequisites

Installation Steps

Key Dependencies Explained

Platform-Specific Notes

Linux Installation Notes

macOS Installation Notes

Windows Installation Notes

Troubleshooting Common Issues

Production Deployment

Using Gunicorn (Linux/macOS)

Using Waitress (Windows)

Environment Variables

Verification Checklist

Uninstallation

Getting Help

Quickstart Guide

Build docs developers (and LLMs) love

Get Started

Core Concepts

Training

Prediction

Documentation Index

​Overview

​System Requirements

​Hardware Requirements

​Software Prerequisites

​Installation Steps

​Key Dependencies Explained

​Platform-Specific Notes

​Linux Installation Notes

​macOS Installation Notes

​Windows Installation Notes

​Troubleshooting Common Issues

​Production Deployment

​Using Gunicorn (Linux/macOS)

​Using Waitress (Windows)

​Environment Variables

​Verification Checklist

​Uninstallation

​Getting Help

Quickstart Guide

Build docs developers (and LLMs) love

Overview

System Requirements

Hardware Requirements

Software Prerequisites

Installation Steps

Key Dependencies Explained

Platform-Specific Notes

Linux Installation Notes

macOS Installation Notes

Windows Installation Notes

Troubleshooting Common Issues

Production Deployment

Using Gunicorn (Linux/macOS)

Using Waitress (Windows)

Environment Variables

Verification Checklist

Uninstallation

Getting Help