Dataset overview
The model was trained on thousands of dermatological images spanning seven distinct categories of skin lesions. Each image was professionally labeled with one of the seven classification types.Dataset composition
Image dimensions
75×100 pixels (height × width)
Color channels
3 channels (RGB color images)
Classification categories
7 distinct skin lesion types
Data format
- Training data: Stored as NumPy arrays (
X.npy,y.npy) - Grayscale variant: Additional grayscale version available (
X_g.npy) - Label encoding: One-hot encoded vectors for multi-class classification
The dataset includes both color (RGB) and grayscale versions of images. The color version is used for the CNN model to capture important skin tone and lesion color information.
Data preprocessing
Train-test split
The dataset was divided into training and testing sets using a stratified split:- Training set: 60% of data
- Test set: 40% of data
- Random state: Fixed at 101 for reproducibility
Data augmentation
To increase the effective size of the training dataset and improve model generalization, two augmentation techniques were applied:Horizontal flip augmentation
Horizontal flip augmentation
Method: Flip images across the y-axisRationale: Skin lesions can appear on either side of the body, so horizontal orientation should not affect classification.
Zoom augmentation
Zoom augmentation
Method: Crop to 33% zoom into the center of the image, then resizeRationale: Simulates varying distances between camera and lesion, improving scale invariance.
- Each training image received one random transformation
- 50% probability of horizontal flip
- 50% probability of zoom augmentation
- Augmented images were added to the training set, effectively doubling its size
After augmentation, the training set size doubled, providing more diverse examples for the model to learn from while preventing overfitting.
Label encoding
The classification labels were converted from integer class indices to one-hot encoded vectors:- Class 0 (Actinic Keratoses):
[1, 0, 0, 0, 0, 0, 0] - Class 4 (Melanoma):
[0, 0, 0, 0, 1, 0, 0] - Class 6 (Vascular Lesion):
[0, 0, 0, 0, 0, 0, 1]
One-hot encoding is essential for categorical crossentropy loss and softmax activation, enabling the model to output probability distributions across all classes.
Training configuration
Hyperparameters
The model was trained with the following configuration:| Parameter | Value | Purpose |
|---|---|---|
| Epochs | 20 | Number of complete passes through training data |
| Batch size | 10 | Number of samples processed before updating weights |
| Validation split | 0.4 (40%) | Portion of data held out for validation |
| Learning rate | 0.0001 | Step size for optimizer updates |
| Optimizer | RMSprop | Adaptive learning rate optimization |
| Decay | 1e-6 | Gradual reduction of learning rate |
Loss function
Categorical crossentropy measures the difference between predicted probability distributions and true labels:y_trueis the one-hot encoded true labely_predis the softmax output probability distribution
Categorical crossentropy is ideal for multi-class classification as it heavily penalizes confident wrong predictions while rewarding correct classifications.
Regularization techniques
To prevent overfitting on the medical imagery:Dropout regularization
Dropout regularization
Implementation: Three dropout layers with varying rates
- First two dropout layers: 0.25 (25% of neurons dropped)
- Final dropout layer: 0.5 (50% of neurons dropped)
Validation monitoring
Validation monitoring
Strategy: 40% of training data reserved for validationPurpose: Monitor performance on unseen data during training to detect overfitting early.Metrics tracked:
- Training accuracy
- Validation accuracy
- Training loss
- Validation loss
Training process
Initialization
The model begins with randomly initialized weights using Glorot Uniform initialization:Forward pass
- Input: Batch of 10 images (75×100×3)
- Convolutional layers: Extract hierarchical features
- Pooling layers: Reduce spatial dimensions
- Dense layers: Perform high-level classification reasoning
- Output: Probability distribution over 7 classes
Backward pass
- Compute loss: Compare predictions to true labels using categorical crossentropy
- Calculate gradients: Determine how each weight affects the loss
- Update weights: Apply RMSprop optimization to minimize loss
- Repeat: Process next batch
Epoch progression
Each epoch consists of:- Training phase: Model sees all training batches and updates weights
- Validation phase: Model evaluates on validation set without updating weights
- Metrics logging: Record training and validation accuracy/loss
The model completes 20 epochs, each taking several minutes on GPU hardware. Total training time depends on dataset size and hardware acceleration.
Model evaluation metrics
After training, the model’s performance was assessed using multiple metrics:Accuracy
Definition: Percentage of correct predictionsROC AUC score
Definition: Area Under the Receiver Operating Characteristic Curve- Measures the model’s ability to distinguish between classes
- Score ranges from 0.0 to 1.0
- Higher scores indicate better discrimination ability
ROC AUC is particularly valuable for medical applications as it evaluates performance across all classification thresholds, not just the default 0.5.
Confusion matrix
A 7×7 matrix showing predictions vs. true labels:- Diagonal elements: Correct classifications
- Off-diagonal elements: Misclassifications
- Row sums: Total samples per true class
- Column sums: Total predictions per class
- Which classes are most accurately classified
- Common misclassification patterns (e.g., melanoma confused with melanocytic nevus)
- Class-specific performance variations
Training challenges
Class imbalance
Some skin cancer types are more common than others in the dataset, potentially biasing the model toward frequent classes.
Visual similarity
Certain lesion types (e.g., melanoma vs. melanocytic nevus) can appear visually similar, making classification difficult.
Limited data
Medical imaging datasets are smaller than general image datasets, increasing overfitting risk.
Image quality variation
Real-world dermatological images vary in lighting, focus, and image capture conditions.
Transfer learning alternative
The training notebook also explores transfer learning using MobileNet as a base:MobileNet transfer learning approach
MobileNet transfer learning approach
Architecture:
- Pre-trained MobileNet (trained on ImageNet) as feature extractor
- Custom classification head with dropout and batch normalization
- Dense layer with 256 units
- Output layer with 7 units for skin cancer classification
- Leverages features learned from millions of images
- Requires less training data
- Often achieves better performance
Transfer learning with MobileNet achieved the best performance during training, but the custom CNN architecture is used for deployment due to web compatibility requirements.
Model export
After training, the model was converted to TensorFlow.js format for web deployment:model.json: Model architecture and layer configurationgroup1-shard1of25.binthroughgroup1-shard25of25.bin: Weight parameters
Next steps
Model architecture
Explore the detailed CNN architecture and layer configuration
Classifications
Learn about the 7 skin cancer types the model detects