CreditWiseLoan is a binary classification project that predicts whether a loan application should be approved or rejected based on the applicant’s financial and credit profile. The system ingests structured applicant data — including income, credit history, loan amount, and employment details — and outputs a binary approval decision alongside a confidence score. This project demonstrates how machine learning can assist credit risk assessment workflows by identifying patterns in historical lending decisions.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/dronabopche/100-ML-AI-Project/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Loan approval decisions involve balancing risk (probability of default) against opportunity (approving creditworthy applicants). A binary classifier trained on historical approval data can surface the strongest predictors of creditworthiness and help standardize decision-making. Problem type: Binary classificationTarget variable:
Loan_Status (0 = rejected, 1 = approved)Dataset: Credit loan application dataset (Kaggle)
Project name: CreditWiseLoan Approval
Dataset
The dataset contains structured records of loan applications with the following features:| Feature | Type | Description |
|---|---|---|
Gender | Categorical | Applicant gender (Male / Female) |
Married | Categorical | Marital status (Yes / No) |
Dependents | Categorical | Number of dependents (0, 1, 2, 3+) |
Education | Categorical | Education level (Graduate / Not Graduate) |
Self_Employed | Categorical | Self-employment status (Yes / No) |
ApplicantIncome | Numeric | Monthly income of the primary applicant (USD) |
CoapplicantIncome | Numeric | Monthly income of the co-applicant (USD) |
LoanAmount | Numeric | Loan amount requested (thousands USD) |
Loan_Amount_Term | Numeric | Repayment term in months |
Credit_History | Binary | Credit history meets guidelines (1 = yes, 0 = no) |
Property_Area | Categorical | Property location (Urban / Semiurban / Rural) |
Loan_Status | Binary (target) | Approval decision (1 = approved, 0 = rejected) |
Missing value handling
Several columns contain missing values that must be imputed before training:LoanAmount: Impute with medianLoan_Amount_Term: Impute with mode (360 months)Credit_History: Impute with mode (1.0)- Categorical columns (
Gender,Married,Dependents,Self_Employed): Impute with mode
Feature engineering
Two derived features improve model performance:TotalIncome=ApplicantIncome+CoapplicantIncome— combined household incomeLoanAmountLog=log(LoanAmount)— log-transforms the right-skewed loan amount distribution to reduce the influence of outliers
Preprocessing pipeline
Models
Three classifiers are trained and compared:- Logistic Regression
- Random Forest
- XGBoost
Logistic regression establishes the baseline. It is interpretable and fast to train, making it suitable for understanding which features drive approval decisions.
Key predictors
Credit history is the single most important feature in most trained models — applicants without a positive credit history are very rarely approved. Other high-importance features include:Credit_History— strongest predictor of approvalTotalIncome— higher combined income increases approval probabilityLoanAmount— very high loan amounts relative to income reduce approval likelihoodProperty_Area— semiurban properties show higher approval rates in some datasetsEducation— graduate applicants are approved at slightly higher rates
Model performance
| Model | Accuracy | Precision | Recall | AUC |
|---|---|---|---|---|
| Logistic Regression | ~80% | ~82% | ~88% | 0.82 |
| Random Forest | ~82% | ~84% | ~89% | 0.85 |
| XGBoost | ~83% | ~85% | ~90% | 0.87 |
The dataset is moderately imbalanced (~69% approved, ~31% rejected). Evaluate using AUC and F1 in addition to accuracy. Consider adjusting the decision threshold if minimizing false approvals (type II errors) is a priority.
API design
POST /predict
Accepts a structured loan application and returns a binary approval decision. Request bodyApplicant gender. Accepted values:
"Male", "Female".Marital status. Accepted values:
"Yes", "No".Number of dependents. Accepted values:
"0", "1", "2", "3+".Education level. Accepted values:
"Graduate", "Not Graduate".Self-employment status. Accepted values:
"Yes", "No".Monthly income of the primary applicant in USD.
Monthly income of the co-applicant in USD. Use
0 if no co-applicant.Requested loan amount in thousands of USD.
Repayment term in months (e.g., 360 for 30 years).
Whether the applicant’s credit history meets lender guidelines.
1 = yes, 0 = no.Property location. Accepted values:
"Urban", "Semiurban", "Rural".Binary approval decision.
1 = approved, 0 = rejected.Human-readable decision label:
"Approved" or "Rejected".Model confidence score (0.0–1.0) for the predicted outcome.
Running the project
Explore data and train models