This project predicts whether an employee is at risk of leaving an organization using three logistic regression models trained on HR analytics data. The system exposes a REST API that accepts a feature vector describing an employee’s profile and returns independent predictions and confidence scores from each model, giving HR teams a multi-perspective view of retention risk.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/dronabopche/100-ML-AI-Project/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Employee turnover is a classification problem: given a set of features describing an employee, predict whether they will leave (1) or stay (0). The project uses three logistic regression variants to balance interpretability, regularization, and feature sparsity:- Baseline Logistic Regression — unregularized; establishes a performance floor
- L1 Regularized (Lasso) — promotes feature sparsity; identifies the smallest predictive feature set
- L2 Regularized (Ridge) — stabilizes coefficients; handles correlated features
Target variable: Employee turnover (0 = stays, 1 = leaves)
Dataset:
employee_turnover.csv
Key features
The model uses 15 input features covering multiple dimensions of an employee’s profile:| Feature category | Examples |
|---|---|
| Performance | Satisfaction score, last evaluation score, average monthly hours |
| Work history | Number of projects, time at company, years since last promotion |
| Compensation | Salary band, department code |
| Engagement | Work accident indicator, promotion in last 5 years |
Input features must be pre-normalized to the [0, 1] range for satisfaction and evaluation scores before sending to the API. The API applies StandardScaler internally, but raw categorical codes should be integer-encoded by the caller.
Model performance
All three models achieve strong predictive performance on the test set:| Model | Accuracy | F1 Score | AUC |
|---|---|---|---|
| Baseline | ~0.89 | High | 0.94 |
| L1 (Lasso) | ~0.89+ | Best | 0.94 |
| L2 (Ridge) | ~0.89 | Stable | 0.94 |
API design
Prediction pipeline
Internal model pipeline
POST /predict
Accepts a 15-element feature vector and returns predictions from all three models with confidence scores. Request bodyA 15-element numeric array representing the employee’s profile. Elements correspond to: satisfaction level, last evaluation, number of projects, average monthly hours, time at company, work accident, left, promotion last 5 years, department (encoded), salary band (encoded), and additional HR metrics.
Binary turnover prediction from the unregularized logistic regression model.
0 = stays, 1 = leaves.Binary turnover prediction from the L1-regularized (Lasso) logistic regression model.
Binary turnover prediction from the L2-regularized (Ridge) logistic regression model.
Per-model probability scores (0.0–1.0) representing the model’s confidence that the prediction is correct.
| Status | Condition |
|---|---|
400 | Feature array has wrong length or is missing |
405 | Non-POST request to /predict |
Request handling logic
Running the project
Train or verify model files
Open the notebook to retrain if needed:Trained models are saved to
models/baseline_model.pkl, l1_model.pkl, l2_model.pkl, and scaler.pkl.Design considerations
- Standardization is mandatory. All three models are sensitive to feature scale. The
scaler.pklStandardScaler must be applied before any model receives input. - L1 provides feature sparsity. If interpretability is the goal, the L1 model’s non-zero coefficients identify the most predictive features.
- L2 ensures coefficient stability. Use the L2 model when feature collinearity is a concern.
- The API is designed for low-latency synchronous inference — no queuing layer is required at typical HR analytics volumes.