Docker Setup - Diabetes Prediction ML

Overview

Both Phase 2 (CLI) and Phase 3 (API) use Docker containers for consistent, reproducible deployments. This page covers Docker setup, configuration, and best practices for the diabetes prediction system.

Docker Version: The project uses standard Docker (Docker Engine 20.10+)Base Image: Python 3.12 official image

Why Docker?

Consistency

Same environment on development, testing, and production machines

Isolation

Dependencies don’t conflict with other projects on the same machine

Portability

Run anywhere Docker is installed - Linux, Mac, Windows, cloud

Reproducibility

Dockerfile ensures anyone can build identical environment

Prerequisites

Install Docker

Install Docker Desktop (Mac/Windows) or Docker Engine (Linux):

Windows
macOS
Linux

Download Docker Desktop for Windows
Run installer
Enable WSL 2 backend when prompted
Restart computer
Verify:

docker --version

Download Docker Desktop for Mac
Drag to Applications folder
Launch Docker Desktop
Wait for Docker to start (whale icon in menu bar)
Verify:

docker --version

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install docker.io docker-compose
sudo systemctl start docker
sudo systemctl enable docker

# Add user to docker group (avoid sudo)
sudo usermod -aG docker $USER

# Verify
docker --version

Verify Installation

Test Docker with a simple container:

docker run hello-world

Expected output:

Hello from Docker!
This message shows that your installation appears to be working correctly.

Phase 2: CLI Docker Setup

Dockerfile Analysis

# fase-2/Dockerfile

# Select Python base image
FROM python:3.12

# Set working directory
WORKDIR /app

# Copy necessary files to application directory
ADD train.py /app
ADD predict.py /app
ADD requirements.txt /app

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

Line-by-Line Explanation

FROM python:3.12

Base image: Official Python 3.12 from Docker Hub
Includes Python, pip, and standard library
Based on Debian Linux

WORKDIR /app

Sets /app as the working directory
All subsequent commands run in /app
Directory is created if it doesn’t exist

ADD train.py /app

Copies train.py from build context to /app/train.py
ADD can also extract archives (though not used here)

ADD predict.py /app

Copies predict.py to container

ADD requirements.txt /app

Copies dependencies list

RUN pip install —no-cache-dir -r requirements.txt

Installs Python packages
--no-cache-dir: Reduces image size by not caching packages
Packages: scikit-learn, pandas, imbalanced-learn, loguru, argparse

Dependencies (requirements.txt)

argparse
scikit-learn
loguru
pandas
imbalanced-learn

Building the Image

# Navigate to fase-2 directory
cd ~/workspace/source/fase-2

# Build image with tag 'ai-proyecto-sustituto'
docker build -t ai-proyecto-sustituto .

Build Process:

Step 1/5 : FROM python:3.12
 ---> Pulling python:3.12
Step 2/5 : WORKDIR /app
 ---> Running in abc123def456
Step 3/5 : ADD train.py /app
 ---> 9f8e7d6c5b4a
Step 4/5 : ADD predict.py /app
 ---> 3a2b1c0d9e8f
Step 5/5 : RUN pip install --no-cache-dir -r requirements.txt
 ---> Running in def456ghi789
Successfully built 7f6e5d4c3b2a
Successfully tagged ai-proyecto-sustituto:latest

Build Time: First build takes 5-10 minutes (downloads base image and installs packages). Subsequent builds are faster due to layer caching.

Running the Container

Interactive Mode
Detached Mode
One-off Command

docker run -it --name ai-container ai-proyecto-sustituto /bin/bash

Flags:

-i: Interactive (keep STDIN open)
-t: Allocate pseudo-TTY (terminal)
--name ai-container: Name the container
/bin/bash: Command to run (bash shell)

Result: Opens bash shell inside container

root@abc123def456:/app#

docker run -d --name ai-container ai-proyecto-sustituto sleep infinity

Flags:

-d: Detached (run in background)
sleep infinity: Keep container running

Access later:

docker exec -it ai-container /bin/bash

docker run --rm ai-proyecto-sustituto python train.py --help

Flags:

--rm: Remove container after command completes

Use for: Quick commands without creating persistent containers

Phase 3: API Docker Setup

Dockerfile Analysis

# fase-3/Dockerfile

# Select Python base image
FROM python:3.12

# Set working directory
WORKDIR /app

# Copy necessary files to application directory
ADD .. /app

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Run the application
CMD ["fastapi", "run", "apirest.py", "--port", "80"]

Key Differences from Phase 2

ADD .. /app

Copies parent directory (entire fase-3 folder)
Less selective than Phase 2 (which copied specific files)
Includes all Python files, requirements.txt, etc.

CMD [“fastapi”, “run”, “apirest.py”, “—port”, “80”]

Default command when container starts
Launches FastAPI application on port 80
Unlike Phase 2, container runs automatically (no need for /bin/bash)

Dependencies (requirements.txt)

fastapi==0.111.0
scikit-learn==1.4.1.post1
loguru==0.7.2
pandas==2.2.1
imbalanced-learn==0.12.0

Phase 3 has versioned dependencies for production stability, unlike Phase 2 which uses latest versions.

Building the Image

cd ~/workspace/source/fase-3
docker build -t apirest .

Running the API Container

docker run -d --name apirest-container -p 80:80 apirest

Flags:

-d: Detached mode (background)
--name apirest-container: Container name
-p 80:80: Port mapping (host:container)
apirest: Image name

Port Mapping Explanation:

-p 80:80
   ↑   ↑
   |   Container port (FastAPI listens on 80)
   Host port (access at localhost:80)

Use different host port if 80 is occupied:

docker run -d --name apirest-container -p 8080:80 apirest
# Access at localhost:8080

Data Management

Copying Files to Containers

# Syntax: docker cp <source> <container>:<destination>

# Copy training data
docker cp train.csv ai-container:/app

# Copy test data
docker cp test.csv ai-container:/app

# Copy from subdirectory
docker cp ~/data/train.csv ai-container:/app/data/

Volume Mounting

For persistent data storage and easier file access:

Phase 2 with Volumes
Phase 3 with Volumes

docker run -it \
  --name ai-container \
  -v $(pwd)/data:/app/data \
  -v $(pwd)/models:/app/models \
  ai-proyecto-sustituto /bin/bash

Inside container:

python train.py --data_file /app/data/train.csv --model_file /app/models/model.pkl

Result: Files persist on host even after container is removed

docker run -d \
  --name apirest-container \
  -p 80:80 \
  -v $(pwd)/data:/app/data \
  -v $(pwd)/models:/app/models \
  apirest

Benefit: Models and data persist across container restarts

Windows Users: Use absolute paths:

docker run -v C:\Users\YourName\data:/app/data ...

Container Management

Essential Commands

Lifecycle
Inspection
Logs
Execution

# Start stopped container
docker start ai-container

# Stop running container
docker stop ai-container

# Restart container
docker restart ai-container

# Remove container
docker rm ai-container

# Force remove running container
docker rm -f ai-container

# List running containers
docker ps

# List all containers (including stopped)
docker ps -a

# Container details
docker inspect ai-container

# Resource usage
docker stats ai-container

# Port mappings
docker port apirest-container

# View logs
docker logs ai-container

# Follow logs (real-time)
docker logs -f apirest-container

# Last 100 lines
docker logs --tail 100 ai-container

# With timestamps
docker logs -t ai-container

# Execute command in running container
docker exec ai-container python --version

# Interactive shell
docker exec -it ai-container /bin/bash

# As specific user
docker exec -u root ai-container apt-get update

Image Management

# List images
docker images

# Remove image
docker rmi apirest

# Remove unused images
docker image prune

# Remove all unused images
docker image prune -a

# Image details
docker history apirest

# Image size
docker images apirest --format "{{.Size}}"

Advanced Docker Configurations

Environment Variables

# Pass environment variables
docker run -d \
  -e MODEL_FILE=/app/models/model.pkl \
  -e DATA_FILE=/app/data/train.csv \
  -e LOG_LEVEL=DEBUG \
  apirest

In Python code:

import os

model_file = os.getenv('MODEL_FILE', 'model.pkl')
data_file = os.getenv('DATA_FILE', 'train.csv')
log_level = os.getenv('LOG_LEVEL', 'INFO')

Resource Limits

# Limit memory
docker run -d \
  --memory="2g" \
  --memory-swap="2g" \
  apirest

# Limit CPU
docker run -d \
  --cpus="1.5" \
  apirest

# Combined
docker run -d \
  --memory="2g" \
  --cpus="2" \
  --name apirest-container \
  -p 80:80 \
  apirest

Health Checks

Add to Dockerfile:

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:80/health || exit 1

Or in docker run:

docker run -d \
  --health-cmd="curl -f http://localhost:80/health || exit 1" \
  --health-interval=30s \
  --health-timeout=3s \
  --health-retries=3 \
  apirest

Multi-stage Builds

Optimize image size:

# Build stage
FROM python:3.12 AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Runtime stage
FROM python:3.12-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
ENV PATH=/root/.local/bin:$PATH
CMD ["fastapi", "run", "apirest.py", "--port", "80"]

Benefits:

Smaller final image (uses slim base)
Faster pulls and deployments
Reduced attack surface

Docker Compose (Optional)

For more complex setups:

# docker-compose.yml
version: '3.8'

services:
  api:
    build: ./fase-3
    container_name: diabetes-api
    ports:
      - "80:80"
    volumes:
      - ./data:/app/data
      - ./models:/app/models
    environment:
      - MODEL_FILE=/app/models/model.pkl
      - DATA_FILE=/app/data/train.csv
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:80/health"]
      interval: 30s
      timeout: 3s
      retries: 3

Usage:

# Start services
docker-compose up -d

# View logs
docker-compose logs -f

# Stop services
docker-compose down

# Rebuild and start
docker-compose up -d --build

Best Practices

Use Specific Base Image Versions

# Good
FROM python:3.12.1

# Avoid
FROM python:latest

Ensures reproducibility across builds.

Minimize Layers

# Good - One RUN layer
RUN apt-get update && apt-get install -y \
    package1 \
    package2 \
 && rm -rf /var/lib/apt/lists/*

# Avoid - Multiple RUN layers
RUN apt-get update
RUN apt-get install -y package1
RUN apt-get install -y package2

Use .dockerignore

# .dockerignore
__pycache__
*.pyc
.git
.env
venv/
*.pkl
*.csv

Prevents unnecessary files from being copied into the image.

Don't Run as Root

FROM python:3.12

# Create non-root user
RUN useradd -m -u 1000 appuser
USER appuser

WORKDIR /home/appuser/app
COPY --chown=appuser:appuser . .

RUN pip install --user -r requirements.txt

Use ENV for Configuration

ENV MODEL_FILE=model.pkl \
    DATA_FILE=train.csv \
    LOG_LEVEL=INFO

Troubleshooting

Port already allocated

Error: Bind for 0.0.0.0:80 failed: port is already allocatedSolutions:

Use different port:

docker run -p 8080:80 apirest

Stop conflicting container:

docker ps | grep :80
docker stop <container_id>

Kill process using port:

# Linux/Mac
sudo lsof -i :80
sudo kill <PID>

# Windows
netstat -ano | findstr :80
taskkill /PID <PID> /F

Container exits immediately

Problem: Container stops right after startingDiagnosis:

docker logs <container>

Common Causes:

No CMD in Dockerfile (Phase 2)
Application crashes on startup
Missing dependencies

Solutions:For Phase 2:

docker run -it ai-proyecto-sustituto /bin/bash

For Phase 3:

docker logs apirest-container
# Check for Python errors

Cannot connect to Docker daemon

Error: Cannot connect to the Docker daemonSolutions:

Windows/Mac: Start Docker Desktop
Linux: Start Docker service:

sudo systemctl start docker

Permissions (Linux):

sudo usermod -aG docker $USER
# Log out and back in

Build fails - no space left

Error: no space left on deviceSolution: Clean up Docker resources:

# Remove unused containers
docker container prune

# Remove unused images
docker image prune -a

# Remove unused volumes
docker volume prune

# Nuclear option - remove everything
docker system prune -a --volumes

Security Considerations

Important Security Practices:

Don’t include sensitive data in images:
```
# DON'T do this
COPY kaggle.json /app/
```

Use secrets for credentials:

docker run -e KAGGLE_KEY=$(cat kaggle_key.txt) ...

Scan images for vulnerabilities:
```
docker scan apirest
```

Keep base images updated:

docker pull python:3.12
docker build -t apirest .

Next Steps

CLI Usage

Detailed CLI operations and automation

API Deployment

Production API deployment strategies

Phase 2: CLI

CLI tools walkthrough

Phase 3: API

REST API implementation guide

Overview

Getting Started

Core Concepts

Deployment

Documentation Index

​Overview

​Why Docker?

Consistency

Isolation

Portability

Reproducibility

​Prerequisites

​Phase 2: CLI Docker Setup

​Dockerfile Analysis

​Dependencies (requirements.txt)

​Building the Image

​Running the Container

​Phase 3: API Docker Setup

​Dockerfile Analysis

​Dependencies (requirements.txt)

​Building the Image

​Running the API Container

​Data Management

​Copying Files to Containers

​Volume Mounting

​Container Management

​Essential Commands

​Image Management

​Advanced Docker Configurations

​Environment Variables

​Resource Limits

​Health Checks

​Multi-stage Builds

​Docker Compose (Optional)

​Best Practices

​Troubleshooting

​Security Considerations

​Next Steps

CLI Usage

API Deployment

Phase 2: CLI

Phase 3: API

Build docs developers (and LLMs) love

Overview

Why Docker?

Prerequisites

Phase 2: CLI Docker Setup

Dockerfile Analysis

Dependencies (requirements.txt)

Building the Image

Running the Container

Phase 3: API Docker Setup

Dockerfile Analysis

Dependencies (requirements.txt)

Building the Image

Running the API Container

Data Management

Copying Files to Containers

Volume Mounting

Container Management

Essential Commands

Image Management

Advanced Docker Configurations

Environment Variables

Resource Limits

Health Checks

Multi-stage Builds

Docker Compose (Optional)

Best Practices

Troubleshooting

Security Considerations

Next Steps