Documentation Index
Fetch the complete documentation index at: https://mintlify.com/jonatan-leal/ia-proyecto-sustituto/llms.txt
Use this file to discover all available pages before exploring further.
Endpoint
train.csv and saves it to model.pkl. This endpoint must be called before making predictions.
Request
This endpoint does not require any request body or parameters.Headers
No special headers are required for this endpoint.Response
A message indicating the training result.
- Success:
"Model successfully trained" - Error:
"Something went wrong"
Training Process
The training endpoint performs the following steps:- Load Training Data: Reads data from
train.csv - Encode Categorical Variables:
- Gender: Female (0), Male (1), Other (2)
- Smoking History: No Info (0), current (1), ever (2), former (3), never (4), not current (5)
- Feature Scaling: Applies StandardScaler to normalize features
- Resampling: Uses SMOTEENN to handle class imbalance
- Model Training: Trains a Random Forest Classifier
- Save Model: Serializes the trained model to
model.pklusing pickle
If a model file already exists, it will be overwritten automatically (overwrite mode is enabled by default).
Examples
Response Examples
Success Response
Status Code:200 OK
Error Response
Status Code:200 OK
Prerequisites
Training Data
Ensure
train.csv exists in the API’s working directory with the following columns:- gender
- age
- hypertension
- heart_disease
- smoking_history
- bmi
- HbA1c_level
- blood_glucose_level
- diabetes (target variable)
Dependencies
The following Python packages must be installed:
- scikit-learn (RandomForestClassifier, StandardScaler)
- imbalanced-learn (SMOTEENN)
- pandas
- pickle
- loguru
Model Configuration
The trained model uses the following configuration:Random Forest Classifier with default scikit-learn parameters
StandardScaler for feature normalization
SMOTEENN with random_state=42 for handling class imbalance
Common Issues
Missing train.csv file
Missing train.csv file
Error: Training fails because
train.csv is not found.Solution: Ensure the train.csv file is present in the API’s working directory with the correct column structure.Insufficient memory
Insufficient memory
Error: Training fails due to memory constraints.Solution: Reduce the size of the training dataset or increase available system memory.
Invalid data format
Invalid data format
Error: Training fails due to incorrect data types in train.csv.Solution: Verify that all columns have the correct data types (strings for categorical, numeric for continuous variables).
Permission denied for model.pkl
Permission denied for model.pkl
Error: Cannot write model file due to permission issues.Solution: Ensure the API has write permissions in the working directory.
Training Time
Training time depends on several factors:- Dataset size: Larger datasets take longer to train
- CPU resources: More CPU cores can speed up Random Forest training
- SMOTEENN resampling: Adds overhead for balancing classes
Next Steps
After successfully training the model:- Verify that
model.pklwas created in the working directory - Use the POST /predict endpoint to make predictions
- Monitor model performance and retrain periodically with updated data
Make Predictions
Learn how to use the trained model to make diabetes predictions