POST /train

Endpoint

POST /train

Trains a Random Forest Classifier model using the data from train.csv and saves it to model.pkl. This endpoint must be called before making predictions.

Request

This endpoint does not require any request body or parameters.

Headers

No special headers are required for this endpoint.

Response

message

string

required

A message indicating the training result.

Success: "Model successfully trained"
Error: "Something went wrong"

Training Process

The training endpoint performs the following steps:

Load Training Data: Reads data from train.csv
Encode Categorical Variables:
- Gender: Female (0), Male (1), Other (2)
- Smoking History: No Info (0), current (1), ever (2), former (3), never (4), not current (5)
Feature Scaling: Applies StandardScaler to normalize features
Resampling: Uses SMOTEENN to handle class imbalance
Model Training: Trains a Random Forest Classifier
Save Model: Serializes the trained model to model.pkl using pickle

If a model file already exists, it will be overwritten automatically (overwrite mode is enabled by default).

Examples

curl -X POST http://localhost/train

Response Examples

Success Response

Status Code: 200 OK

{
  "message": "Model successfully trained"
}

Error Response

Status Code: 200 OK

{
  "message": "Something went wrong"
}

Note that even when errors occur during training, the API returns a 200 status code with an error message. Check the response message to determine success or failure.

Prerequisites

Training Data

Ensure train.csv exists in the API’s working directory with the following columns:

gender
age
hypertension
heart_disease
smoking_history
bmi
HbA1c_level
blood_glucose_level
diabetes (target variable)

Dependencies

The following Python packages must be installed:

scikit-learn (RandomForestClassifier, StandardScaler)
imbalanced-learn (SMOTEENN)
pandas
pickle
loguru

Disk Space

Ensure sufficient disk space is available to save the model file (model.pkl).

Model Configuration

The trained model uses the following configuration:

model

RandomForestClassifier

Random Forest Classifier with default scikit-learn parameters

scaler

StandardScaler

StandardScaler for feature normalization

resampler

SMOTEENN

SMOTEENN with random_state=42 for handling class imbalance

Common Issues

Missing train.csv file

Error: Training fails because train.csv is not found.Solution: Ensure the train.csv file is present in the API’s working directory with the correct column structure.

Insufficient memory

Error: Training fails due to memory constraints.Solution: Reduce the size of the training dataset or increase available system memory.

Invalid data format

Error: Training fails due to incorrect data types in train.csv.Solution: Verify that all columns have the correct data types (strings for categorical, numeric for continuous variables).

Permission denied for model.pkl

Error: Cannot write model file due to permission issues.Solution: Ensure the API has write permissions in the working directory.

Training Time

Training time depends on several factors:

Dataset size: Larger datasets take longer to train
CPU resources: More CPU cores can speed up Random Forest training
SMOTEENN resampling: Adds overhead for balancing classes

Typical training time for a dataset with ~100,000 rows: 30-120 seconds

Next Steps

After successfully training the model:

Verify that model.pkl was created in the working directory
Use the POST /predict endpoint to make predictions
Monitor model performance and retrain periodically with updated data

Make Predictions

Learn how to use the trained model to make diabetes predictions

Endpoints

Models

Endpoint

Request

Headers

Response

Training Process

Examples

Response Examples

Success Response

Error Response

Prerequisites

Model Configuration

Common Issues

Training Time

Next Steps

Make Predictions

Build docs developers (and LLMs) love

Endpoints

Models

Documentation Index

​Endpoint

​Request

​Headers

​Response

​Training Process

​Examples

​Response Examples

​Success Response

​Error Response

​Prerequisites

​Model Configuration

​Common Issues

​Training Time

​Next Steps

Make Predictions

Build docs developers (and LLMs) love

Endpoint

Request

Headers

Response

Training Process

Examples

Response Examples

Success Response

Error Response

Prerequisites

Model Configuration

Common Issues

Training Time

Next Steps