Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/dronabopche/100-ML-AI-Project/llms.txt

Use this file to discover all available pages before exploring further.

CreditWiseLoan is a binary classification project that predicts whether a loan application should be approved or rejected based on the applicant’s financial and credit profile. The system ingests structured applicant data — including income, credit history, loan amount, and employment details — and outputs a binary approval decision alongside a confidence score. This project demonstrates how machine learning can assist credit risk assessment workflows by identifying patterns in historical lending decisions.

Overview

Loan approval decisions involve balancing risk (probability of default) against opportunity (approving creditworthy applicants). A binary classifier trained on historical approval data can surface the strongest predictors of creditworthiness and help standardize decision-making. Problem type: Binary classification
Target variable: Loan_Status (0 = rejected, 1 = approved)
Dataset: Credit loan application dataset (Kaggle)
Project name: CreditWiseLoan Approval

Dataset

The dataset contains structured records of loan applications with the following features:
FeatureTypeDescription
GenderCategoricalApplicant gender (Male / Female)
MarriedCategoricalMarital status (Yes / No)
DependentsCategoricalNumber of dependents (0, 1, 2, 3+)
EducationCategoricalEducation level (Graduate / Not Graduate)
Self_EmployedCategoricalSelf-employment status (Yes / No)
ApplicantIncomeNumericMonthly income of the primary applicant (USD)
CoapplicantIncomeNumericMonthly income of the co-applicant (USD)
LoanAmountNumericLoan amount requested (thousands USD)
Loan_Amount_TermNumericRepayment term in months
Credit_HistoryBinaryCredit history meets guidelines (1 = yes, 0 = no)
Property_AreaCategoricalProperty location (Urban / Semiurban / Rural)
Loan_StatusBinary (target)Approval decision (1 = approved, 0 = rejected)

Missing value handling

Several columns contain missing values that must be imputed before training:
  • LoanAmount: Impute with median
  • Loan_Amount_Term: Impute with mode (360 months)
  • Credit_History: Impute with mode (1.0)
  • Categorical columns (Gender, Married, Dependents, Self_Employed): Impute with mode

Feature engineering

Two derived features improve model performance:
  • TotalIncome = ApplicantIncome + CoapplicantIncome — combined household income
  • LoanAmountLog = log(LoanAmount) — log-transforms the right-skewed loan amount distribution to reduce the influence of outliers

Preprocessing pipeline

Models

Three classifiers are trained and compared:
Logistic regression establishes the baseline. It is interpretable and fast to train, making it suitable for understanding which features drive approval decisions.
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(C=1.0, max_iter=1000)
model.fit(X_train_scaled, y_train)

Key predictors

Credit history is the single most important feature in most trained models — applicants without a positive credit history are very rarely approved. Other high-importance features include:
  1. Credit_History — strongest predictor of approval
  2. TotalIncome — higher combined income increases approval probability
  3. LoanAmount — very high loan amounts relative to income reduce approval likelihood
  4. Property_Area — semiurban properties show higher approval rates in some datasets
  5. Education — graduate applicants are approved at slightly higher rates

Model performance

ModelAccuracyPrecisionRecallAUC
Logistic Regression~80%~82%~88%0.82
Random Forest~82%~84%~89%0.85
XGBoost~83%~85%~90%0.87
The dataset is moderately imbalanced (~69% approved, ~31% rejected). Evaluate using AUC and F1 in addition to accuracy. Consider adjusting the decision threshold if minimizing false approvals (type II errors) is a priority.

API design

POST /predict

Accepts a structured loan application and returns a binary approval decision. Request body
gender
string
required
Applicant gender. Accepted values: "Male", "Female".
married
string
required
Marital status. Accepted values: "Yes", "No".
dependents
string
required
Number of dependents. Accepted values: "0", "1", "2", "3+".
education
string
required
Education level. Accepted values: "Graduate", "Not Graduate".
self_employed
string
required
Self-employment status. Accepted values: "Yes", "No".
applicant_income
number
required
Monthly income of the primary applicant in USD.
coapplicant_income
number
required
Monthly income of the co-applicant in USD. Use 0 if no co-applicant.
loan_amount
number
required
Requested loan amount in thousands of USD.
loan_amount_term
integer
required
Repayment term in months (e.g., 360 for 30 years).
credit_history
integer
required
Whether the applicant’s credit history meets lender guidelines. 1 = yes, 0 = no.
property_area
string
required
Property location. Accepted values: "Urban", "Semiurban", "Rural".
Example request
curl -X POST http://localhost:5000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "gender": "Male",
    "married": "Yes",
    "dependents": "1",
    "education": "Graduate",
    "self_employed": "No",
    "applicant_income": 5000,
    "coapplicant_income": 1500,
    "loan_amount": 150,
    "loan_amount_term": 360,
    "credit_history": 1,
    "property_area": "Semiurban"
  }'
Example response
{
  "loan_status": 1,
  "decision": "Approved",
  "confidence": 0.87
}
Response fields
loan_status
integer
Binary approval decision. 1 = approved, 0 = rejected.
decision
string
Human-readable decision label: "Approved" or "Rejected".
confidence
number
Model confidence score (0.0–1.0) for the predicted outcome.

Running the project

1

Install dependencies

cd ML_To_Train/26_CreditWiseLoan_appoval
pip install -r requirements.txt
2

Explore data and train models

jupyter notebook CreditWiseLoan_approval.ipynb
The notebook covers EDA, feature engineering, model training, and evaluation.
3

Start the API

cd src
python app.py
4

Submit a loan application

curl -X POST http://localhost:5000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "gender": "Female",
    "married": "No",
    "dependents": "0",
    "education": "Graduate",
    "self_employed": "No",
    "applicant_income": 4500,
    "coapplicant_income": 0,
    "loan_amount": 100,
    "loan_amount_term": 360,
    "credit_history": 1,
    "property_area": "Urban"
  }'

Project structure

26_CreditWiseLoan_appoval/

├── src/
│   └── app.py                        # Flask API entry point

├── CreditWiseLoan_approval.ipynb     # Full analysis and training notebook
└── readme.md
Feature importance from the Random Forest or XGBoost model can be used to explain individual approval or rejection decisions. Extract model.feature_importances_ and map them to feature names to generate a per-application explanation.

Build docs developers (and LLMs) love