Data Loader

Overview

The data loader modules provide simple interfaces for loading preprocessed data from CSV files for both training and prediction workflows.

Training Data Loader

Class: Data_Getter

Location: source/data_ingestion/data_loader.py Version: 1.0 Purpose: Loads training data from the database export for model training.

Constructor

Data_Getter(file_object, logger_object)

file_object

File

required

File object for logging operations

logger_object

Logger

required

Logger instance for tracking data loading

Configuration:

self.training_file = 'Training_FileFromDB/InputFile.csv'

Method: get_data()

Reads training data from the configured CSV file path.

get_data()

return

pandas.DataFrame

Complete training dataset including features and labels

Example Usage:

from data_ingestion.data_loader import Data_Getter

# Initialize data getter
data_getter = Data_Getter(file_object, logger_object)

# Load training data
training_data = data_getter.get_data()

print(f"Loaded {len(training_data)} training samples")
print(f"Features: {training_data.columns.tolist()}")

Implementation:

def get_data(self):
    self.logger_object.log(
        self.file_object,
        'Entered the get_data method of the Data_Getter class'
    )
    try:
        self.data = pd.read_csv(self.training_file)
        self.logger_object.log(
            self.file_object,
            'Data Load Successful.Exited the get_data method of the Data_Getter class'
        )
        return self.data
    except Exception as e:
        self.logger_object.log(
            self.file_object,
            'Exception occured in get_data method of the Data_Getter class. Exception message: ' + str(e)
        )
        self.logger_object.log(
            self.file_object,
            'Data Load Unsuccessful.Exited the get_data method of the Data_Getter class'
        )
        raise Exception()

Data Source:

Path: Training_FileFromDB/InputFile.csv
Format: CSV with headers
Content: Raw training data exported from database

The file path Training_FileFromDB/InputFile.csv should contain data exported from the database validation step

Prediction Data Loader

Class: Data_Getter_Pred

Location: source/data_ingestion/data_loader_prediction.py Version: 1.0 Purpose: Loads new data for making fraud predictions using trained models.

Constructor

Data_Getter_Pred(file_object, logger_object)

file_object

File

required

File object for logging operations

logger_object

Logger

required

Logger instance for tracking data loading

Configuration:

self.prediction_file = 'Prediction_FileFromDB/InputFile.csv'

Method: get_data()

Reads prediction data from the configured CSV file path.

get_data()

return

pandas.DataFrame

Prediction dataset with features (no labels)

Example Usage:

from data_ingestion.data_loader_prediction import Data_Getter_Pred

# Initialize prediction data getter
data_getter = Data_Getter_Pred(file_object, logger_object)

# Load prediction data
prediction_data = data_getter.get_data()

print(f"Loaded {len(prediction_data)} samples for prediction")
print(f"Features: {prediction_data.columns.tolist()}")

Implementation:

def get_data(self):
    self.logger_object.log(
        self.file_object,
        'Entered the get_data method of the Data_Getter class'
    )
    try:
        self.data = pd.read_csv(self.prediction_file)
        self.logger_object.log(
            self.file_object,
            'Data Load Successful.Exited the get_data method of the Data_Getter class'
        )
        return self.data
    except Exception as e:
        self.logger_object.log(
            self.file_object,
            'Exception occured in get_data method of the Data_Getter class. Exception message: ' + str(e)
        )
        self.logger_object.log(
            self.file_object,
            'Data Load Unsuccessful.Exited the get_data method of the Data_Getter class'
        )
        raise Exception()

Data Source:

Path: Prediction_FileFromDB/InputFile.csv
Format: CSV with headers
Content: New data for prediction (without labels)

Prediction data should have the same feature structure as training data, but without the target label column

Complete Data Loading Workflow

Training Pipeline

from data_ingestion.data_loader import Data_Getter
from application_logging.logger import App_Logger

# Initialize logging
file_object = open("Training_Logs/DataLoadLog.txt", 'a+')
logger = App_Logger()

try:
    # Load training data
    data_getter = Data_Getter(file_object, logger)
    data = data_getter.get_data()
    
    print(f"Successfully loaded {len(data)} training records")
    
    # Proceed with preprocessing
    # ...
    
except Exception as e:
    print(f"Data loading failed: {str(e)}")
    
finally:
    file_object.close()

Prediction Pipeline

from data_ingestion.data_loader_prediction import Data_Getter_Pred
from application_logging.logger import App_Logger

# Initialize logging
file_object = open("Prediction_Logs/DataLoadLog.txt", 'a+')
logger = App_Logger()

try:
    # Load prediction data
    data_getter = Data_Getter_Pred(file_object, logger)
    data = data_getter.get_data()
    
    print(f"Successfully loaded {len(data)} records for prediction")
    
    # Proceed with preprocessing and prediction
    # ...
    
except Exception as e:
    print(f"Data loading failed: {str(e)}")
    
finally:
    file_object.close()

Database Integration

The data loaders expect CSV files exported from database validation:

Training Data Export

# Pseudo-code for database export process

Validate data schema in database
Export validated training records to CSV
Save to: Training_FileFromDB/InputFile.csv
Data_Getter loads this file for training

Prediction Data Export

# Pseudo-code for prediction data export

Receive new data via API/batch upload
Validate data schema matches training format
Export to: Prediction_FileFromDB/InputFile.csv
Data_Getter_Pred loads this file for predictions

Data Expectations

Training Data Format

Required Columns:

All feature columns from fraud detection schema
Target column: fraud_reported (values: ‘Y’ or ‘N’)

Example Structure:

months_as_customer,policy_deductable,policy_csl,insured_sex,fraud_reported
328,1000,250/500,MALE,N
180,2000,100/300,FEMALE,Y
...

Prediction Data Format

Required Columns:

All feature columns matching training data
NO target column (fraud_reported should be absent)

Example Structure:

months_as_customer,policy_deductable,policy_csl,insured_sex
425,1500,250/500,MALE
92,2000,500/1000,FEMALE
...

Ensure prediction data has identical feature names and order as training data to avoid preprocessing errors

Error Handling

Both classes handle common loading errors: FileNotFoundError:

# Raised when CSV file doesn't exist at specified path
Exception: Data Load Unsuccessful

Permission Errors:

# Raised when file cannot be read
Exception: Data Load Unsuccessful

Parsing Errors:

# Raised for invalid CSV format
Exception: Data Load Unsuccessful

All exceptions are logged with details before re-raising.

Logging

Both classes provide detailed logging: Success Log:

Entered the get_data method of the Data_Getter class
Data Load Successful.Exited the get_data method of the Data_Getter class

Failure Log:

Entered the get_data method of the Data_Getter class
Exception occured in get_data method of the Data_Getter class. Exception message: [error details]
Data Load Unsuccessful.Exited the get_data method of the Data_Getter class

Dependencies

import pandas as pd

Best Practices

File Organization

Keep training and prediction files in separate directories

Data Validation

Validate CSV schema before loading to catch issues early

Logging

Always initialize logger before creating data getter instances

Error Handling

Wrap data loading in try-except blocks for graceful failure

Flask Endpoints

Modules

Overview

Training Data Loader

Class: Data_Getter

Constructor

Method: get_data()

Prediction Data Loader

Class: Data_Getter_Pred

Constructor

Method: get_data()

Complete Data Loading Workflow

Training Pipeline

Prediction Pipeline

Database Integration

Training Data Export

Prediction Data Export

Data Expectations

Training Data Format

Prediction Data Format

Error Handling

Logging

Dependencies

Best Practices

File Organization

Data Validation

Logging

Error Handling

Build docs developers (and LLMs) love

Flask Endpoints

Modules

Documentation Index

​Overview

​Training Data Loader

​Class: Data_Getter

​Constructor

​Method: get_data()

​Prediction Data Loader

​Class: Data_Getter_Pred

​Constructor

​Method: get_data()

​Complete Data Loading Workflow

​Training Pipeline

​Prediction Pipeline

​Database Integration

​Training Data Export

​Prediction Data Export

​Data Expectations

​Training Data Format

​Prediction Data Format

​Error Handling

​Logging

​Dependencies

​Best Practices

File Organization

Data Validation

Logging

Error Handling

Build docs developers (and LLMs) love

Overview

Training Data Loader

Class: Data_Getter

Constructor

Method: get_data()

Prediction Data Loader

Class: Data_Getter_Pred

Constructor

Method: get_data()

Complete Data Loading Workflow

Training Pipeline

Prediction Pipeline

Database Integration

Training Data Export

Prediction Data Export

Data Expectations

Training Data Format

Prediction Data Format

Error Handling

Logging

Dependencies

Best Practices