Skip to main content

Output Format

The fraud detection system generates predictions in a simple CSV format with a single column containing fraud indicators for each input record.

File Location

Predictions are saved to:
Prediction_Output_File/Predictions.csv
The system automatically deletes any existing Predictions.csv file before generating new predictions to prevent confusion from previous runs.

CSV Structure

The output file contains a single column:
Column NameData TypeDescription
PredictionsStringFraud indicator: ‘Y’ or ‘N’
Example output:
Predictions
N
N
Y
N
Y
N
N
N

Y/N Encoding

The system uses a simple binary encoding scheme:

Y = Fraud

Indicates that the model detected fraudulent activity in the insurance claim.Model Output: 1Risk Level: High

N = Not Fraud

Indicates that the model did not detect fraudulent activity in the claim.Model Output: 0Risk Level: Low

Encoding Logic

The encoding is performed in the prediction loop from predictFromModel.py:62-67:
result = model.predict(cluster_data)
for res in result:
    if res == 0:
        predictions.append('N')
    else:
        predictions.append('Y')
The model’s raw output is a binary classification (0 or 1), which is converted to human-readable ‘N’ or ‘Y’ values for easier interpretation.

Result Interpretation

Understanding Predictions

Each row in the output file corresponds to a row in the input data file in the same order:
1

Match by Row Number

The first prediction corresponds to the first input record, the second to the second record, and so on.
2

Review 'Y' Predictions

Claims marked with ‘Y’ should be flagged for manual review by fraud investigators.
3

Process 'N' Predictions

Claims marked with ‘N’ can proceed through normal processing workflows.

Output Generation Process

The final output is created using pandas from predictFromModel.py:69-71:
final = pd.DataFrame(list(zip(predictions)), columns=['Predictions'])
path = "Prediction_Output_File/Predictions.csv"
final.to_csv("Prediction_Output_File/Predictions.csv", header=True, mode='a+')
  • Predictions are collected in a list during cluster-based processing
  • The list is converted to a pandas DataFrame with a ‘Predictions’ column
  • The DataFrame is written to CSV with headers included
  • File mode is ‘a+’ (append), but the file is deleted at the start of each run

Example Output

Consider a batch of 10 insurance claims:
months_as_customer,policy_annual_premium,incident_severity,...
328,1406,Major Damage,...
228,1197,Minor Damage,...
134,1413,Total Loss,...
256,1415,Minor Damage,...
422,1583,Major Damage,...
Interpretation:
  • Claims 1, 2, 4, 6, 7, 8, 10: No fraud detected (N)
  • Claims 3, 5, 9: Potential fraud detected (Y) - require investigation

Working with Results

Combining with Input Data

To create a comprehensive report, combine the predictions with the original input data:
import pandas as pd

# Load input data
input_data = pd.read_csv('Prediction_FileFromDB/InputFile.csv')

# Load predictions
predictions = pd.read_csv('Prediction_Output_File/Predictions.csv')

# Combine
results = pd.concat([input_data, predictions], axis=1)

# Filter fraud cases
fraud_cases = results[results['Predictions'] == 'Y']

# Save combined results
results.to_csv('Complete_Predictions_Report.csv', index=False)

Filtering High-Risk Claims

Identify claims that require investigation:
import pandas as pd

# Load combined results
results = pd.read_csv('Complete_Predictions_Report.csv')

# Get fraud predictions
fraud_claims = results[results['Predictions'] == 'Y']

print(f"Total claims processed: {len(results)}")
print(f"Fraudulent claims detected: {len(fraud_claims)}")
print(f"Fraud rate: {len(fraud_claims)/len(results)*100:.2f}%")

# Save for investigation
fraud_claims.to_csv('Fraud_Investigation_Queue.csv', index=False)

Prediction Statistics

Tracking Fraud Rates

Monitor fraud detection trends over time:
import pandas as pd
from collections import Counter

predictions = pd.read_csv('Prediction_Output_File/Predictions.csv')
counts = Counter(predictions['Predictions'])

total = len(predictions)
fraud_count = counts['Y']
legit_count = counts['N']

print(f"Total Predictions: {total}")
print(f"Fraud Detected (Y): {fraud_count} ({fraud_count/total*100:.1f}%)")
print(f"No Fraud (N): {legit_count} ({legit_count/total*100:.1f}%)")

Logging and Audit Trail

All prediction operations are logged to:
Prediction_Logs/Prediction_Log.txt
Log entries include:
  • Start and end timestamps
  • Number of records processed
  • Any errors or exceptions
  • Model loading events
Example log entry:
2026-03-04 14:30:15 - Start of Prediction
2026-03-04 14:30:16 - Data Load Successful
2026-03-04 14:30:18 - Preprocessing completed
2026-03-04 14:30:19 - KMeans model loaded
2026-03-04 14:30:22 - Cluster 0 model loaded: XGBClassifier0
2026-03-04 14:30:24 - Cluster 1 model loaded: RandomForestClassifier1
2026-03-04 14:30:26 - End of Prediction

Best Practices

Always verify that the number of predictions matches the number of input records:
input_rows = len(pd.read_csv('input_file.csv'))
prediction_rows = len(pd.read_csv('Prediction_Output_File/Predictions.csv'))
assert input_rows == prediction_rows, "Row count mismatch!"
Save prediction results with timestamps for audit trails:
from datetime import datetime
import shutil

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
archive_path = f"Prediction_Archive/Predictions_{timestamp}.csv"
shutil.copy('Prediction_Output_File/Predictions.csv', archive_path)
Check for empty output files before processing:
import os

if os.path.exists('Prediction_Output_File/Predictions.csv'):
    predictions = pd.read_csv('Prediction_Output_File/Predictions.csv')
    if len(predictions) == 0:
        print("Warning: No predictions generated")
else:
    print("Error: Prediction file not found")

Next Steps

Prediction Overview

Review the complete prediction workflow

Batch Prediction

Learn how to process batch files

Data Validation

Understand data validation requirements

Build docs developers (and LLMs) love