Training process and dataset

The skin cancer detection model was trained using a supervised learning approach on a dataset of pre-identified dermatological images. The training process involved data augmentation, validation splitting, and iterative optimization to achieve strong classification performance.

Dataset overview

The model was trained on thousands of dermatological images spanning seven distinct categories of skin lesions. Each image was professionally labeled with one of the seven classification types.

Dataset composition

Image dimensions

75×100 pixels (height × width)

Color channels

3 channels (RGB color images)

Classification categories

7 distinct skin lesion types

Data format

Training data: Stored as NumPy arrays (X.npy, y.npy)
Grayscale variant: Additional grayscale version available (X_g.npy)
Label encoding: One-hot encoded vectors for multi-class classification

The dataset includes both color (RGB) and grayscale versions of images. The color version is used for the CNN model to capture important skin tone and lesion color information.

Data preprocessing

Train-test split

The dataset was divided into training and testing sets using a stratified split:

X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.4, 
    random_state=101
)

Training set: 60% of data
Test set: 40% of data
Random state: Fixed at 101 for reproducibility

Data augmentation

To increase the effective size of the training dataset and improve model generalization, two augmentation techniques were applied:

Horizontal flip augmentation

Method: Flip images across the y-axisRationale: Skin lesions can appear on either side of the body, so horizontal orientation should not affect classification.

X_augmented.append(cv2.flip(X_train[i], 1))

Zoom augmentation

Method: Crop to 33% zoom into the center of the image, then resizeRationale: Simulates varying distances between camera and lesion, improving scale invariance.

zoom = 0.33
centerX, centerY = int(IMG_HEIGHT/2), int(IMG_WIDTH/2)
radiusX = int((1-zoom)*IMG_HEIGHT*2)
radiusY = int((1-zoom)*IMG_WIDTH*2)
cropped = X_train[i][minX:maxX, minY:maxY]
new_img = cv2.resize(cropped, (IMG_WIDTH, IMG_HEIGHT))

Augmentation strategy:

Each training image received one random transformation
50% probability of horizontal flip
50% probability of zoom augmentation
Augmented images were added to the training set, effectively doubling its size

After augmentation, the training set size doubled, providing more diverse examples for the model to learn from while preventing overfitting.

Label encoding

The classification labels were converted from integer class indices to one-hot encoded vectors:

y_train_onehot = np.zeros((y_train.size, 7))
y_train_onehot[np.arange(y_train.size), y_train.astype(int)] = 1

Example encoding:

Class 0 (Actinic Keratoses): [1, 0, 0, 0, 0, 0, 0]
Class 4 (Melanoma): [0, 0, 0, 0, 1, 0, 0]
Class 6 (Vascular Lesion): [0, 0, 0, 0, 0, 0, 1]

One-hot encoding is essential for categorical crossentropy loss and softmax activation, enabling the model to output probability distributions across all classes.

Training configuration

Hyperparameters

The model was trained with the following configuration:

Parameter	Value	Purpose
Epochs	20	Number of complete passes through training data
Batch size	10	Number of samples processed before updating weights
Validation split	0.4 (40%)	Portion of data held out for validation
Learning rate	0.0001	Step size for optimizer updates
Optimizer	RMSprop	Adaptive learning rate optimization
Decay	1e-6	Gradual reduction of learning rate

Loss function

Categorical crossentropy measures the difference between predicted probability distributions and true labels:

Loss = -Σ y_true * log(y_pred)

Where:

y_true is the one-hot encoded true label
y_pred is the softmax output probability distribution

Categorical crossentropy is ideal for multi-class classification as it heavily penalizes confident wrong predictions while rewarding correct classifications.

Regularization techniques

To prevent overfitting on the medical imagery:

Dropout regularization

Implementation: Three dropout layers with varying rates

First two dropout layers: 0.25 (25% of neurons dropped)
Final dropout layer: 0.5 (50% of neurons dropped)

Effect: Randomly ignores neurons during training, forcing the network to learn redundant representations and improving generalization.

Validation monitoring

Strategy: 40% of training data reserved for validationPurpose: Monitor performance on unseen data during training to detect overfitting early.Metrics tracked:

Training accuracy
Validation accuracy
Training loss
Validation loss

Training process

Initialization

The model begins with randomly initialized weights using Glorot Uniform initialization:

kernel_initializer = GlorotUniform()
bias_initializer = Zeros()

Glorot initialization sets initial weights to:

W ~ Uniform(-√(6/(n_in + n_out)), √(6/(n_in + n_out)))

This prevents vanishing or exploding gradients in early training.

Forward pass

Input: Batch of 10 images (75×100×3)
Convolutional layers: Extract hierarchical features
Pooling layers: Reduce spatial dimensions
Dense layers: Perform high-level classification reasoning
Output: Probability distribution over 7 classes

Backward pass

Compute loss: Compare predictions to true labels using categorical crossentropy
Calculate gradients: Determine how each weight affects the loss
Update weights: Apply RMSprop optimization to minimize loss
Repeat: Process next batch

Epoch progression

Each epoch consists of:

Training phase: Model sees all training batches and updates weights
Validation phase: Model evaluates on validation set without updating weights
Metrics logging: Record training and validation accuracy/loss

The model completes 20 epochs, each taking several minutes on GPU hardware. Total training time depends on dataset size and hardware acceleration.

Model evaluation metrics

After training, the model’s performance was assessed using multiple metrics:

Accuracy

Definition: Percentage of correct predictions

accuracy = correct_predictions / total_predictions

Use case: Overall model performance across all classes

ROC AUC score

Definition: Area Under the Receiver Operating Characteristic Curve

Measures the model’s ability to distinguish between classes
Score ranges from 0.0 to 1.0
Higher scores indicate better discrimination ability

roc_score = roc_auc_score(y_test_onehot, y_pred_proba)

ROC AUC is particularly valuable for medical applications as it evaluates performance across all classification thresholds, not just the default 0.5.

Confusion matrix

A 7×7 matrix showing predictions vs. true labels:

Diagonal elements: Correct classifications
Off-diagonal elements: Misclassifications
Row sums: Total samples per true class
Column sums: Total predictions per class

Insights from confusion matrix:

Which classes are most accurately classified
Common misclassification patterns (e.g., melanoma confused with melanocytic nevus)
Class-specific performance variations

Training challenges

Class imbalance

Some skin cancer types are more common than others in the dataset, potentially biasing the model toward frequent classes.

Visual similarity

Certain lesion types (e.g., melanoma vs. melanocytic nevus) can appear visually similar, making classification difficult.

Limited data

Medical imaging datasets are smaller than general image datasets, increasing overfitting risk.

Image quality variation

Real-world dermatological images vary in lighting, focus, and image capture conditions.

Transfer learning alternative

The training notebook also explores transfer learning using MobileNet as a base:

MobileNet transfer learning approach

Architecture:

Pre-trained MobileNet (trained on ImageNet) as feature extractor
Custom classification head with dropout and batch normalization
Dense layer with 256 units
Output layer with 7 units for skin cancer classification

Advantages:

Leverages features learned from millions of images
Requires less training data
Often achieves better performance

Limitation: TensorFlow.js doesn’t support this MobileNet version, so the custom CNN is used for web deployment.

Transfer learning with MobileNet achieved the best performance during training, but the custom CNN architecture is used for deployment due to web compatibility requirements.

Model export

After training, the model was converted to TensorFlow.js format for web deployment:

tfjs.converters.save_keras_model(cnn.model_, 'cnn_model')

Export artifacts:

model.json: Model architecture and layer configuration
group1-shard1of25.bin through group1-shard25of25.bin: Weight parameters

The model can now run entirely in the browser using TensorFlow.js, enabling privacy-preserving client-side inference.

Get Started

Understanding the Model

Using the API

Technical Details

Resources

Dataset overview

Dataset composition

Image dimensions

Color channels

Classification categories

Data format

Data preprocessing

Train-test split

Data augmentation

Label encoding

Training configuration

Hyperparameters

Loss function

Regularization techniques

Training process

Initialization

Forward pass

Backward pass

Epoch progression

Model evaluation metrics

Accuracy

ROC AUC score

Confusion matrix

Training challenges

Class imbalance

Visual similarity

Limited data

Image quality variation

Transfer learning alternative

Model export

Next steps

Model architecture

Classifications

Build docs developers (and LLMs) love

Get Started

Understanding the Model

Using the API

Technical Details

Resources

​Dataset overview

​Dataset composition

Image dimensions

Color channels

Classification categories

​Data format

​Data preprocessing

​Train-test split

​Data augmentation

​Label encoding

​Training configuration

​Hyperparameters

​Loss function

​Regularization techniques

​Training process

​Initialization

​Forward pass

​Backward pass

​Epoch progression

​Model evaluation metrics

​Accuracy

​ROC AUC score

​Confusion matrix

​Training challenges

Class imbalance

Visual similarity

Limited data

Image quality variation

​Transfer learning alternative

​Model export

​Next steps

Model architecture

Classifications

Build docs developers (and LLMs) love

Dataset overview

Dataset composition

Data format

Data preprocessing

Train-test split

Data augmentation

Label encoding

Training configuration

Hyperparameters

Loss function

Regularization techniques

Training process

Initialization

Forward pass

Backward pass

Epoch progression

Model evaluation metrics

Accuracy

ROC AUC score

Confusion matrix

Training challenges

Transfer learning alternative

Model export

Next steps