Overview
This guide provides comprehensive installation instructions for the Fraud Detection ML System. Follow these steps to set up your development environment and get the system running on your machine.Installation typically takes 10-15 minutes depending on your internet connection and system specifications.
System Requirements
Hardware Requirements
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 2 cores | 4+ cores |
| RAM | 4 GB | 8+ GB |
| Storage | 2 GB free | 5+ GB free |
| OS | Linux, macOS, Windows 10+ | Linux (Ubuntu 18.04+) |
Software Prerequisites
Before installing, ensure you have the following software:Python 3.8 (Required)
Python 3.8 (Required)
The system is tested with Python 3.8. While newer versions may work, we recommend Python 3.8 for compatibility.Check your Python version:Expected output:If you don’t have Python 3.8, we recommend installing it via Miniconda (see next section).
Miniconda or Anaconda (Recommended)
Miniconda or Anaconda (Recommended)
Conda provides isolated environments and simplified package management, especially for scientific Python packages.Install Miniconda:Verify installation:Expected output:
Git (Required)
Git (Required)
Git is required to clone the repository.Install Git:Verify installation:
Installation Steps
Clone the Repository
First, clone the fraud detection repository to your local machine:Verify directory structure:You should see directories like:
Training_Batch_Files/Prediction_Batch_files/application_logging/- Files:
main.py,trainingModel.py,requirements.txt
Create Conda Environment
Create a new conda environment named This command:
fraud-detection with Python 3.8:- Creates an isolated environment named
fraud-detection - Installs Python 3.8
-yflag automatically confirms the installation
Activate the Environment
Activate the newly created conda environment:Your terminal prompt should change to show Tip: To automatically activate the environment, add this to your
(fraud-detection) at the beginning:.bashrc or .zshrc:Install Python Dependencies
Install all required Python packages from This installs 44 packages totaling approximately 500MB. The installation typically takes 3-5 minutes.
requirements.txt:View Complete Dependency List
View Complete Dependency List
The following packages will be installed:
Key Dependencies Explained
| Package | Version | Purpose |
|---|---|---|
| Flask | 1.1.1 | Web framework for REST API |
| Flask-Cors | 3.0.8 | Enable CORS for API access |
| Flask-MonitoringDashboard | 3.0.6 | Monitor API performance and usage |
| scikit-learn | 0.22.1 | Machine learning algorithms (RandomForest, etc.) |
| xgboost | 0.90 | Gradient boosting classifier |
| imbalanced-learn | 0.6.1 | Handle imbalanced fraud detection datasets |
| pandas | 0.25.3 | Data manipulation and CSV processing |
| numpy | 1.18.1 | Numerical computing |
| matplotlib | 3.1.2 | Plotting for KMeans elbow curves |
| kneed | 0.5.1 | Automatic elbow detection in clustering |
| SQLAlchemy | 1.3.13 | Database operations |
| gunicorn | 20.0.4 | Production WSGI server |
Verify Installation
Verify that all critical packages are installed correctly:Expected output:Check package versions:Expected output:
Initialize Required Directories
The system requires several directories for logs and data. Most are included in the repository, but verify they exist:If any directories are missing, create them:
Platform-Specific Notes
- Linux
- macOS
- Windows
Linux Installation Notes
Linux is the recommended platform for production deployments.Additional dependencies for Ubuntu/Debian:/etc/systemd/system/fraud-detection.service:Troubleshooting Common Issues
Error: 'pip' command not found
Error: 'pip' command not found
Problem: Pip is not installed or not in PATH.Solution:
Error installing XGBoost
Error installing XGBoost
Problem: XGBoost compilation fails on your platform.Solution 1 - Use conda:Solution 2 - Install build tools:
Error: Failed building wheel for scikit-learn
Error: Failed building wheel for scikit-learn
Problem: Missing build dependencies for scikit-learn.Solution:
ImportError: No module named 'flask'
ImportError: No module named 'flask'
Problem: Wrong Python interpreter or environment not activated.Solution:
Port 5001 already in use
Port 5001 already in use
Problem: Another application is using port 5001.Solution 1 - Use different port:Solution 2 - Find and kill process:
SSL Certificate errors
SSL Certificate errors
Problem: Certificate verification fails during package installation.Solution:
Permission denied errors
Permission denied errors
Problem: Insufficient permissions to create directories or write files.Solution:Linux/macOS:Windows:
Memory errors during training
Memory errors during training
Problem: Insufficient RAM for large datasets.Solution:
- Reduce batch size - Process smaller chunks of data
- Increase swap space (Linux):
- Use a machine with more RAM
- Optimize data preprocessing - Remove unnecessary columns earlier
Production Deployment
Using Gunicorn (Linux/macOS)
Gunicorn is already included inrequirements.txt.
Using Waitress (Windows)
Environment Variables
Set these environment variables for production:Verification Checklist
Before proceeding, verify:- Python 3.8 is installed and active
- Conda environment
fraud-detectionis created and activated - All 44 packages from requirements.txt are installed
-
python -c "import flask, sklearn, xgboost"runs without errors -
python main.pystarts the Flask server successfully - Required directories exist (Training_Batch_Files, Prediction_Batch_files, etc.)
- Port 5001 is accessible
- You can access http://localhost:5001/ in a browser
Once all checklist items are verified, proceed to the Quickstart Guide to train your first model and make predictions.
Uninstallation
To completely remove the fraud detection system:Getting Help
If you encounter issues not covered in this guide:- Check logs - Review
Training_Logs/andPrediction_Logs/for detailed error messages - Verify data format - Ensure your CSV files match the schema in
schema_training.jsonandschema_prediction.json - Test with sample data - Use the provided sample files to isolate the issue
- Review dependencies - Run
pip listto check installed package versions
Quickstart Guide
Ready to start? Follow the quickstart guide to train your first fraud detection model