Skip to main content
The skin cancer detection model was trained using a supervised learning approach on a dataset of pre-identified dermatological images. The training process involved data augmentation, validation splitting, and iterative optimization to achieve strong classification performance.

Dataset overview

The model was trained on thousands of dermatological images spanning seven distinct categories of skin lesions. Each image was professionally labeled with one of the seven classification types.

Dataset composition

Image dimensions

75×100 pixels (height × width)

Color channels

3 channels (RGB color images)

Classification categories

7 distinct skin lesion types

Data format

  • Training data: Stored as NumPy arrays (X.npy, y.npy)
  • Grayscale variant: Additional grayscale version available (X_g.npy)
  • Label encoding: One-hot encoded vectors for multi-class classification
The dataset includes both color (RGB) and grayscale versions of images. The color version is used for the CNN model to capture important skin tone and lesion color information.

Data preprocessing

Train-test split

The dataset was divided into training and testing sets using a stratified split:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.4, 
    random_state=101
)
  • Training set: 60% of data
  • Test set: 40% of data
  • Random state: Fixed at 101 for reproducibility

Data augmentation

To increase the effective size of the training dataset and improve model generalization, two augmentation techniques were applied:
Method: Flip images across the y-axisRationale: Skin lesions can appear on either side of the body, so horizontal orientation should not affect classification.
X_augmented.append(cv2.flip(X_train[i], 1))
Method: Crop to 33% zoom into the center of the image, then resizeRationale: Simulates varying distances between camera and lesion, improving scale invariance.
zoom = 0.33
centerX, centerY = int(IMG_HEIGHT/2), int(IMG_WIDTH/2)
radiusX = int((1-zoom)*IMG_HEIGHT*2)
radiusY = int((1-zoom)*IMG_WIDTH*2)
cropped = X_train[i][minX:maxX, minY:maxY]
new_img = cv2.resize(cropped, (IMG_WIDTH, IMG_HEIGHT))
Augmentation strategy:
  • Each training image received one random transformation
  • 50% probability of horizontal flip
  • 50% probability of zoom augmentation
  • Augmented images were added to the training set, effectively doubling its size
After augmentation, the training set size doubled, providing more diverse examples for the model to learn from while preventing overfitting.

Label encoding

The classification labels were converted from integer class indices to one-hot encoded vectors:
y_train_onehot = np.zeros((y_train.size, 7))
y_train_onehot[np.arange(y_train.size), y_train.astype(int)] = 1
Example encoding:
  • Class 0 (Actinic Keratoses): [1, 0, 0, 0, 0, 0, 0]
  • Class 4 (Melanoma): [0, 0, 0, 0, 1, 0, 0]
  • Class 6 (Vascular Lesion): [0, 0, 0, 0, 0, 0, 1]
One-hot encoding is essential for categorical crossentropy loss and softmax activation, enabling the model to output probability distributions across all classes.

Training configuration

Hyperparameters

The model was trained with the following configuration:
ParameterValuePurpose
Epochs20Number of complete passes through training data
Batch size10Number of samples processed before updating weights
Validation split0.4 (40%)Portion of data held out for validation
Learning rate0.0001Step size for optimizer updates
OptimizerRMSpropAdaptive learning rate optimization
Decay1e-6Gradual reduction of learning rate

Loss function

Categorical crossentropy measures the difference between predicted probability distributions and true labels:
Loss = -Σ y_true * log(y_pred)
Where:
  • y_true is the one-hot encoded true label
  • y_pred is the softmax output probability distribution
Categorical crossentropy is ideal for multi-class classification as it heavily penalizes confident wrong predictions while rewarding correct classifications.

Regularization techniques

To prevent overfitting on the medical imagery:
Implementation: Three dropout layers with varying rates
  • First two dropout layers: 0.25 (25% of neurons dropped)
  • Final dropout layer: 0.5 (50% of neurons dropped)
Effect: Randomly ignores neurons during training, forcing the network to learn redundant representations and improving generalization.
Strategy: 40% of training data reserved for validationPurpose: Monitor performance on unseen data during training to detect overfitting early.Metrics tracked:
  • Training accuracy
  • Validation accuracy
  • Training loss
  • Validation loss

Training process

Initialization

The model begins with randomly initialized weights using Glorot Uniform initialization:
kernel_initializer = GlorotUniform()
bias_initializer = Zeros()
Glorot initialization sets initial weights to:
W ~ Uniform(-√(6/(n_in + n_out)), √(6/(n_in + n_out)))
This prevents vanishing or exploding gradients in early training.

Forward pass

  1. Input: Batch of 10 images (75×100×3)
  2. Convolutional layers: Extract hierarchical features
  3. Pooling layers: Reduce spatial dimensions
  4. Dense layers: Perform high-level classification reasoning
  5. Output: Probability distribution over 7 classes

Backward pass

  1. Compute loss: Compare predictions to true labels using categorical crossentropy
  2. Calculate gradients: Determine how each weight affects the loss
  3. Update weights: Apply RMSprop optimization to minimize loss
  4. Repeat: Process next batch

Epoch progression

Each epoch consists of:
  1. Training phase: Model sees all training batches and updates weights
  2. Validation phase: Model evaluates on validation set without updating weights
  3. Metrics logging: Record training and validation accuracy/loss
The model completes 20 epochs, each taking several minutes on GPU hardware. Total training time depends on dataset size and hardware acceleration.

Model evaluation metrics

After training, the model’s performance was assessed using multiple metrics:

Accuracy

Definition: Percentage of correct predictions
accuracy = correct_predictions / total_predictions
Use case: Overall model performance across all classes

ROC AUC score

Definition: Area Under the Receiver Operating Characteristic Curve
  • Measures the model’s ability to distinguish between classes
  • Score ranges from 0.0 to 1.0
  • Higher scores indicate better discrimination ability
roc_score = roc_auc_score(y_test_onehot, y_pred_proba)
ROC AUC is particularly valuable for medical applications as it evaluates performance across all classification thresholds, not just the default 0.5.

Confusion matrix

A 7×7 matrix showing predictions vs. true labels:
  • Diagonal elements: Correct classifications
  • Off-diagonal elements: Misclassifications
  • Row sums: Total samples per true class
  • Column sums: Total predictions per class
Insights from confusion matrix:
  • Which classes are most accurately classified
  • Common misclassification patterns (e.g., melanoma confused with melanocytic nevus)
  • Class-specific performance variations

Training challenges

Class imbalance

Some skin cancer types are more common than others in the dataset, potentially biasing the model toward frequent classes.

Visual similarity

Certain lesion types (e.g., melanoma vs. melanocytic nevus) can appear visually similar, making classification difficult.

Limited data

Medical imaging datasets are smaller than general image datasets, increasing overfitting risk.

Image quality variation

Real-world dermatological images vary in lighting, focus, and image capture conditions.

Transfer learning alternative

The training notebook also explores transfer learning using MobileNet as a base:
Architecture:
  • Pre-trained MobileNet (trained on ImageNet) as feature extractor
  • Custom classification head with dropout and batch normalization
  • Dense layer with 256 units
  • Output layer with 7 units for skin cancer classification
Advantages:
  • Leverages features learned from millions of images
  • Requires less training data
  • Often achieves better performance
Limitation: TensorFlow.js doesn’t support this MobileNet version, so the custom CNN is used for web deployment.
Transfer learning with MobileNet achieved the best performance during training, but the custom CNN architecture is used for deployment due to web compatibility requirements.

Model export

After training, the model was converted to TensorFlow.js format for web deployment:
tfjs.converters.save_keras_model(cnn.model_, 'cnn_model')
Export artifacts:
  • model.json: Model architecture and layer configuration
  • group1-shard1of25.bin through group1-shard25of25.bin: Weight parameters
The model can now run entirely in the browser using TensorFlow.js, enabling privacy-preserving client-side inference.

Next steps

Model architecture

Explore the detailed CNN architecture and layer configuration

Classifications

Learn about the 7 skin cancer types the model detects

Build docs developers (and LLMs) love