Documentation Index Fetch the complete documentation index at: https://mintlify.com/jonatan-leal/ia-proyecto-sustituto/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Phase 3 provides a production-ready REST API built with FastAPI for real-time diabetes predictions. This phase is designed for web applications, microservices, and integration with other systems.
Best For : Production deployments, web applications, mobile apps, and any system requiring real-time HTTP-based predictions.Location : ~/workspace/source/fase-3/
Architecture
The API consists of several key components:
fase-3/
├── apirest.py # FastAPI application with endpoints
├── patient.py # Pydantic model for request validation
├── train.py # Training logic (no CLI)
├── predict.py # Prediction logic
├── Dockerfile # Container configuration
└── requirements.txt # Dependencies
apirest.py Main FastAPI application exposing /train and /predict endpoints
patient.py Pydantic model defining patient data structure and validation rules
train.py Training function called by the /train endpoint
predict.py Prediction function that processes Patient objects
Quick Start
Navigate to Directory
cd ~/workspace/source/fase-3
Build Docker Image
docker build -t apirest .
This creates a container with:
Python 3.12
FastAPI 0.111.0
scikit-learn 1.4.1
imbalanced-learn 0.12.0
All required dependencies
Run API Container
docker run -d --name apirest-container -p 80:80 apirest
Flags explained:
-d: Detached mode (runs in background)
--name apirest-container: Container name
-p 80:80: Map host port 80 to container port 80
Copy Training Data
docker cp train.csv apirest-container:/app
The API expects train.csv in the /app directory.
Access API Documentation
Open your browser and navigate to: You’ll see the interactive Swagger UI.
API Endpoints
POST /train
Trains a new diabetes prediction model using the train.csv file.
Swagger UI
cURL
Python
JavaScript
Navigate to http://localhost/docs
Click on POST /train
Click “Try it out”
Click “Execute”
curl -X POST "http://localhost/train" \
-H "accept: application/json"
import requests
response = requests.post( "http://localhost/train" )
print (response.json())
# {'message': 'Model successfully trained'}
fetch ( 'http://localhost/train' , {
method: 'POST' ,
headers: { 'Accept' : 'application/json' }
})
. then ( response => response . json ())
. then ( data => console . log ( data ));
// {message: 'Model successfully trained'}
Response:
{
"message" : "Model successfully trained"
}
What it does:
Checks if train.csv exists in /app
Loads and encodes the data
Applies StandardScaler normalization
Uses SMOTEENN to balance classes
Trains RandomForestClassifier
Saves model to model.pkl
POST /predict
Makes a diabetes prediction for a single patient.
Request Body:
{
"gender" : "Female" ,
"age" : 36 ,
"hypertension" : 0 ,
"heart_disease" : 0 ,
"smoking_history" : "current" ,
"bmi" : 32.27 ,
"HbA1c_level" : 6.2 ,
"blood_glucose_level" : 220
}
Swagger UI
cURL
Python
JavaScript
Navigate to http://localhost/docs
Click on POST /predict
Click “Try it out”
Modify the request body with patient data
Click “Execute”
curl -X POST "http://localhost/predict" \
-H "Content-Type: application/json" \
-d '{
"gender": "Female",
"age": 36,
"hypertension": 0,
"heart_disease": 0,
"smoking_history": "current",
"bmi": 32.27,
"HbA1c_level": 6.2,
"blood_glucose_level": 220
}'
import requests
patient_data = {
"gender" : "Female" ,
"age" : 36 ,
"hypertension" : 0 ,
"heart_disease" : 0 ,
"smoking_history" : "current" ,
"bmi" : 32.27 ,
"HbA1c_level" : 6.2 ,
"blood_glucose_level" : 220
}
response = requests.post(
"http://localhost/predict" ,
json = patient_data
)
print (response.json())
# {'message': 'Tiene diabetes'}
const patientData = {
gender: "Female" ,
age: 36 ,
hypertension: 0 ,
heart_disease: 0 ,
smoking_history: "current" ,
bmi: 32.27 ,
HbA1c_level: 6.2 ,
blood_glucose_level: 220
};
fetch ( 'http://localhost/predict' , {
method: 'POST' ,
headers: { 'Content-Type' : 'application/json' },
body: JSON . stringify ( patientData )
})
. then ( response => response . json ())
. then ( data => console . log ( data ));
// {message: 'Tiene diabetes'}
Response:
Has Diabetes
No Diabetes
Error
{
"message" : "Tiene diabetes"
}
Source Code Deep Dive
apirest.py - Main Application
from fastapi import FastAPI
from patient import Patient
import train as tr
import predict as pr
app = FastAPI()
@app.post ( "/train" )
def train ():
"""
Exposes an endpoint to train a machine learning model.
"""
return tr.train()
@app.post ( "/predict" )
def predict ( patient : Patient):
"""
Exposes an endpoint to make a prediction based on patient information.
Parameters:
- patient: A Patient object containing the patient's input information.
Returns:
- A dictionary with a prediction message.
"""
return pr.predict(patient)
FastAPI Features:
Automatic request validation via Pydantic
Interactive API docs at /docs
OpenAPI schema at /openapi.json
Type hints for better IDE support
patient.py - Request Model
from pydantic import BaseModel
class Patient ( BaseModel ):
"""
Represents a patient with their respective attributes.
Attributes:
- gender (str): Patient's gender (Female, Male, Other)
- age (int): Patient's age
- hypertension (int): Hypertension disease (1: yes, 0: no)
- heart_disease (int): Heart disease (1: yes, 0: no)
- smoking_history (str): Patient's smoking history (not current, former, No Info, current, never, ever.)
- bmi (float): Patient's body mass index
- HbA1c_level (float): Patient's hemoglobin A1c level
- blood_glucose_level (int): Patient's blood sugar level
"""
gender: str
age: int
hypertension: int
heart_disease: int
smoking_history: str
bmi: float
HbA1c_level: float
blood_glucose_level: int
Pydantic Validation Benefits
The Patient model provides automatic validation: Type Checking: # This will fail - age must be int
{ "age" : "thirty-six" , ... }
Required Fields: # This will fail - missing bmi
{ "gender" : "Female" , "age" : 36 , ... }
Type Coercion: # This works - "36" converted to 36
{ "age" : "36" , ... }
FastAPI automatically returns detailed error messages for invalid requests.
train.py - Training Logic
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from imblearn.combine import SMOTEENN
from loguru import logger
import os
import pandas as pd
import pickle
model_file = "model.pkl" # Model file path
data_file = "train.csv" # Training data file path
overwrite = True # Overwrite existing model
def train ():
"""
Trains a machine learning model using data from 'train.csv'
and saves the trained model to 'model.pkl'.
"""
try :
# Check if model file already exists
if os.path.isfile(model_file):
if overwrite:
logger.info( f "overwriting existing model file { model_file } " )
else :
logger.info(
f "model file { model_file } exists. exitting. use --overwrite_model option"
)
exit ( - 1 )
# Load training data
logger.info( "loading train data" )
z = pd.read_csv(data_file)
# Encode training data
logger.info( "encoding train data" )
gender_dict = { "Female" : 0 , "Male" : 1 , "Other" : 2 }
smoking_history_dict = {
"No Info" : 0 ,
"current" : 1 ,
"ever" : 2 ,
"former" : 3 ,
"never" : 4 ,
"not current" : 5 ,
}
z = z.replace({ "gender" : gender_dict, "smoking_history" : smoking_history_dict})
# Separate features (Xtr) and labels (ytr)
Xtr = z.drop( "diabetes" , axis = 1 )
ytr = z[[ "diabetes" ]]
# Scale training data
logger.info( "scaling train data" )
scaler = StandardScaler()
Xtr = scaler.fit_transform(Xtr)
# Apply oversampling and undersampling with SMOTEENN
smote_enn = SMOTEENN( random_state = 42 )
Xtr, ytr = smote_enn.fit_resample(Xtr, ytr)
# Train the model
logger.info( "fitting model" )
m = RandomForestClassifier()
m.fit(Xtr, ytr)
# Save model to file
logger.info( f "saving model to { model_file } " )
with open (model_file, "wb" ) as f:
pickle.dump(m, f)
return { "message" : "Model successfully trained" }
except :
return { "message" : "Something went wrong" }
The except clause catches all exceptions without logging details. This makes debugging difficult. Consider improving error handling: except Exception as e:
logger.error( f "Training failed: { str (e) } " )
return { "message" : f "Training failed: { str (e) } " }
predict.py - Prediction Logic
from patient import Patient
from sklearn.preprocessing import StandardScaler
from imblearn.combine import SMOTEENN
from loguru import logger
import os
import pandas as pd
import pickle
model_file = "model.pkl" # Model file path
def predict ( patient : Patient):
"""
Makes a prediction based on the input patient information.
Parameters:
- patient: A Patient object containing the patient's input information.
Returns:
- A dictionary with a prediction message.
"""
try :
# Load input data
Xts = pd.DataFrame([patient.model_dump()])
# Verify model file exists
if not os.path.isfile(model_file):
logger.error( f "model file { model_file } does not exist" )
exit ( - 1 )
# Encode input data
logger.info( "encoding data" )
gender_dict = { "Female" : 0 , "Male" : 1 , "Other" : 2 }
smoking_history_dict = {
"No Info" : 0 ,
"current" : 1 ,
"ever" : 2 ,
"former" : 3 ,
"never" : 4 ,
"not current" : 5 ,
}
Xts = Xts.replace({ "gender" : gender_dict, "smoking_history" : smoking_history_dict})
# Scale input data
logger.info( "scaling data" )
scaler = StandardScaler()
Xts = scaler.fit_transform(Xts)
# Load model
logger.info( "loading model" )
with open (model_file, "rb" ) as f:
m = pickle.load(f)
# Make predictions
logger.info( "making predictions" )
preds = m.predict(Xts)
return { "message" : "Tiene diabetes" if preds[ 0 ] == 1 else "No tiene diabetes" }
except :
return { "message" : "Something went wrong" }
patient.model_dump() is Pydantic v2 syntax. In Pydantic v1, use patient.dict() instead.
Dockerfile
# Select Python base image
FROM python:3.12
# Set working directory
WORKDIR /app
# Copy necessary files to application directory
ADD .. /app
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Run the application
CMD [ "fastapi" , "run" , "apirest.py" , "--port" , "80" ]
The ADD .. /app copies the parent directory. This works but is unconventional. A cleaner approach:
Dependencies
fastapi==0.111.0
scikit-learn==1.4.1.post1
loguru==0.7.2
pandas==2.2.1
imbalanced-learn==0.12.0
Testing the API
Sample Patient Profiles
High Risk
Low Risk
Moderate Risk
Example from README
{
"gender" : "Female" ,
"age" : 65 ,
"hypertension" : 1 ,
"heart_disease" : 1 ,
"smoking_history" : "current" ,
"bmi" : 35.5 ,
"HbA1c_level" : 7.2 ,
"blood_glucose_level" : 250
}
Expected : “Tiene diabetes”Risk Factors : Elderly, obese, hypertension, heart disease, diabetic HbA1c, very high glucose{
"gender" : "Male" ,
"age" : 25 ,
"hypertension" : 0 ,
"heart_disease" : 0 ,
"smoking_history" : "never" ,
"bmi" : 22.0 ,
"HbA1c_level" : 4.9 ,
"blood_glucose_level" : 85
}
Expected : “No tiene diabetes”Protective Factors : Young, healthy weight, no diseases, non-smoker, normal glucose{
"gender" : "Female" ,
"age" : 45 ,
"hypertension" : 0 ,
"heart_disease" : 0 ,
"smoking_history" : "former" ,
"bmi" : 28.5 ,
"HbA1c_level" : 5.9 ,
"blood_glucose_level" : 110
}
Expected : VariableMixed Factors : Overweight, prediabetic range, but no major diseases{
"gender" : "Female" ,
"age" : 36 ,
"hypertension" : 0 ,
"heart_disease" : 0 ,
"smoking_history" : "current" ,
"bmi" : 32.27 ,
"HbA1c_level" : 6.2 ,
"blood_glucose_level" : 220
}
Expected : “Tiene diabetes”This is the exact example from the project README.
Validation Testing
Request: {
"gender" : "Female" ,
"age" : 36
// Missing other required fields
}
Response (422 Unprocessable Entity): {
"detail" : [
{
"loc" : [ "body" , "hypertension" ],
"msg" : "field required" ,
"type" : "value_error.missing"
},
// ... other missing fields
]
}
Request: {
"gender" : "Female" ,
"age" : "thirty-six" , // Should be int
...
}
Response (422 Unprocessable Entity): {
"detail" : [
{
"loc" : [ "body" , "age" ],
"msg" : "value is not a valid integer" ,
"type" : "type_error.integer"
}
]
}
Invalid Categorical Value
Request: {
"gender" : "NonBinary" , // Not in encoding dict
...
}
Response: {
"message" : "Something went wrong"
}
This should be caught by validation, but currently passes Pydantic and fails during encoding. Consider adding enum validation: from enum import Enum
class Gender ( str , Enum ):
female = "Female"
male = "Male"
other = "Other"
class Patient ( BaseModel ):
gender: Gender
...
Advanced Usage
Custom Port
Run on a different port:
# Run on port 8080 instead of 80
docker run -d --name apirest-container -p 8080:80 apirest
# Access at http://localhost:8080/docs
Environment Variables
Pass configuration via environment variables:
docker run -d \
--name apirest-container \
-p 80:80 \
-e MODEL_FILE=/app/models/diabetes_v2.pkl \
-e DATA_FILE=/app/data/train.csv \
apirest
Modify train.py and predict.py to use environment variables:
import os
model_file = os.getenv( "MODEL_FILE" , "model.pkl" )
data_file = os.getenv( "DATA_FILE" , "train.csv" )
Volume Mounting
Persist models and data:
mkdir -p ./data ./models
docker run -d \
--name apirest-container \
-p 80:80 \
-v $( pwd ) /data:/app/data \
-v $( pwd ) /models:/app/models \
apirest
# Copy training data
cp train.csv ./data/
# Model will persist in ./models/model.pkl
Health Check Endpoint
Add a health check to apirest.py:
@app.get ( "/health" )
def health ():
return { "status" : "healthy" }
@app.get ( "/model-status" )
def model_status ():
model_exists = os.path.isfile( "model.pkl" )
return {
"model_trained" : model_exists,
"model_file" : "model.pkl"
}
CORS for Web Applications
Enable CORS to allow requests from web browsers:
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins = [ "*" ], # In production, specify actual origins
allow_credentials = True ,
allow_methods = [ "*" ],
allow_headers = [ "*" ],
)
Integration Examples
Python Client
import requests
class DiabetesPredictor :
def __init__ ( self , base_url = "http://localhost" ):
self .base_url = base_url
def train_model ( self ):
"""Train the diabetes prediction model."""
response = requests.post( f " { self .base_url } /train" )
response.raise_for_status()
return response.json()
def predict ( self , patient_data ):
"""Make a prediction for a patient."""
response = requests.post(
f " { self .base_url } /predict" ,
json = patient_data
)
response.raise_for_status()
return response.json()
# Usage
predictor = DiabetesPredictor()
# Train model
result = predictor.train_model()
print (result) # {'message': 'Model successfully trained'}
# Make prediction
patient = {
"gender" : "Female" ,
"age" : 36 ,
"hypertension" : 0 ,
"heart_disease" : 0 ,
"smoking_history" : "current" ,
"bmi" : 32.27 ,
"HbA1c_level" : 6.2 ,
"blood_glucose_level" : 220
}
prediction = predictor.predict(patient)
print (prediction) # {'message': 'Tiene diabetes'}
React Frontend
import { useState } from 'react' ;
function DiabetesForm () {
const [ patient , setPatient ] = useState ({
gender: 'Female' ,
age: 36 ,
hypertension: 0 ,
heart_disease: 0 ,
smoking_history: 'never' ,
bmi: 25.0 ,
HbA1c_level: 5.5 ,
blood_glucose_level: 100
});
const [ prediction , setPrediction ] = useState ( null );
const handleSubmit = async ( e ) => {
e . preventDefault ();
const response = await fetch ( 'http://localhost/predict' , {
method: 'POST' ,
headers: { 'Content-Type' : 'application/json' },
body: JSON . stringify ( patient )
});
const result = await response . json ();
setPrediction ( result . message );
};
return (
< form onSubmit = { handleSubmit } >
< input
type = "number"
value = { patient . age }
onChange = { ( e ) => setPatient ({ ... patient , age: parseInt ( e . target . value )}) }
/>
{ /* Other inputs */ }
< button type = "submit" > Predict </ button >
{ prediction && < div > Result: { prediction } </ div > }
</ form >
);
}
Monitoring and Logging
View Container Logs
# Follow logs in real-time
docker logs -f apirest-container
# View last 100 lines
docker logs --tail 100 apirest-container
# View logs with timestamps
docker logs -t apirest-container
Thanks to loguru, logs are well-formatted:
2024-03-10 10:15:23.456 | INFO | train:train:32 - loading train data
2024-03-10 10:15:24.123 | INFO | train:train:36 - encoding train data
2024-03-10 10:15:25.789 | INFO | train:train:52 - scaling train data
2024-03-10 10:15:28.456 | INFO | train:train:61 - fitting model
2024-03-10 10:15:45.123 | INFO | train:train:66 - saving model to model.pkl
Troubleshooting
Error : Bind for 0.0.0.0:80 failed: port is already allocatedSolution : Use a different host port:docker run -d --name apirest-container -p 8080:80 apirest
Access at http://localhost:8080/docs
Model file does not exist
Error : Prediction returns “Something went wrong” with log: model file model.pkl does not existSolution : Train the model first:
Ensure train.csv is copied to container
Call /train endpoint before /predict
docker cp train.csv apirest-container:/app
curl -X POST http://localhost/train
Problem : Browser can’t connect to http://localhost/docs Solutions :
Check container is running:
Check port mapping:
docker port apirest-container
# Should show: 80/tcp -> 0.0.0.0:80
Try with explicit IP:
Check for firewall blocking port 80
Error : Unprocessable Entity with validation detailsReason : Request body doesn’t match Patient schemaSolution : Ensure all required fields are present with correct types:
gender: string (“Female”, “Male”, “Other”)
age: integer
hypertension: integer (0 or 1)
heart_disease: integer (0 or 1)
smoking_history: string (“never”, “current”, “former”, “ever”, “not current”, “No Info”)
bmi: float
HbA1c_level: float
blood_glucose_level: integer
Comparison with Other Phases
Feature Phase 1 Phase 2 Phase 3 Interface Jupyter CLI REST API Input Format Code cells CSV files JSON requests Output Format Inline CSV files JSON responses Scalability Low Medium High Web Integration None None Native Real-time Manual Batch Yes Automatic Docs No No Yes (Swagger) Validation Manual Manual Automatic Production Ready No Partial Yes
Next Steps
API Deployment Advanced deployment strategies and production considerations
Docker Setup Docker best practices and advanced configurations
Patient Features Detailed guide to interpreting patient features
Model Architecture Understanding the RandomForest model and pipeline