VGG-16 CNN architecture

The skin cancer detection model uses a custom VGG-16 inspired Convolutional Neural Network (CNN) architecture, specifically designed to classify dermatological images into seven distinct categories of skin lesions.

Model overview

The model is a Sequential architecture built with Keras and converted to TensorFlow.js format for web deployment. It accepts RGB images of size 75×100 pixels (3 channels) and outputs probability distributions across 7 classification categories.

The model is distributed as a model.json file along with 25 binary weight shards (group1-shard1of25.bin through group1-shard25of25.bin), totaling approximately 100MB.

Input specifications

Input shape: [75, 100, 3] (height × width × channels)
Data type: float32
Color space: RGB
Preprocessing: Images are normalized to float32 values

Architecture layers

The network consists of 27 layers organized into three main stages: convolutional feature extraction, dimensionality reduction, and classification.

Stage 1: Initial convolutional blocks

The first stage applies six consecutive 3×3 convolutional layers with 64 filters each, using same padding to preserve spatial dimensions.

Layers 1-12: Feature extraction

Layer 0: InputLayer - Input shape [null, 75, 100, 3]
Layer 1: Conv2D - 64 filters, kernel 3×3, padding same
Layer 2: Activation - ReLU
Layer 3: Conv2D - 64 filters, kernel 3×3, padding same
Layer 4: Activation - ReLU
Layer 5: Conv2D - 64 filters, kernel 3×3, padding same
Layer 6: Activation - ReLU
Layer 7: Conv2D - 64 filters, kernel 3×3, padding same
Layer 8: Activation - ReLU
Layer 9: Conv2D - 64 filters, kernel 3×3, padding same
Layer 10: Activation - ReLU
Layer 11: Conv2D - 64 filters, kernel 3×3, padding valid
Layer 12: Activation - ReLU

Key characteristics:

Five convolutional layers use same padding to maintain spatial dimensions
One convolutional layer uses valid padding to reduce dimensions before pooling
All convolutions use 3×3 kernels with stride of 1
ReLU activation functions introduce non-linearity after each convolution

Stage 2: Pooling and regularization

After the initial feature extraction, the model applies max pooling to reduce spatial dimensions and dropout for regularization.

Layer 13: MaxPooling2D - pool size 2×2
Layer 14: Dropout - rate 0.25

The first dropout layer uses a rate of 0.25, randomly dropping 25% of neurons during training to prevent overfitting.

Stage 3: Deeper feature learning

The second convolutional block increases filter depth to 128, capturing more complex patterns.

Layers 15-20: Deep feature extraction

Layer 15: Conv2D - 128 filters, kernel 3×3, padding same
Layer 16: Activation - ReLU
Layer 17: Conv2D - 128 filters, kernel 3×3, padding valid
Layer 18: Activation - ReLU
Layer 19: MaxPooling2D - pool size 2×2
Layer 20: Dropout - rate 0.25

Design rationale:

Doubling the filter count to 128 allows the network to learn more abstract features
Second max pooling layer further reduces spatial dimensions
Dropout maintains regularization consistency

Stage 4: Classification head

The final layers flatten the feature maps and apply fully connected layers for classification.

Layers 21-26: Dense classification layers

Layer 21: Flatten
Layer 22: Dense - 512 units, activation linear
Layer 23: Activation - ReLU
Layer 24: Dropout - rate 0.5
Layer 25: Dense - 7 units, activation linear
Layer 26: Activation - Softmax

Classification components:

Flatten layer: Converts 2D feature maps into 1D vector
Dense layer (512 units): Fully connected layer for high-level reasoning
Dropout (0.5): Aggressive regularization before final prediction
Output layer (7 units): One neuron per skin cancer category
Softmax activation: Converts outputs to probability distribution

Training configuration

Optimizer

Type: RMSprop
Learning rate: 0.0001
Decay: 1e-6

RMSprop (Root Mean Square Propagation) is particularly effective for CNNs as it adapts the learning rate for each parameter, helping the model converge more reliably.

Loss function

Type: Categorical crossentropy
Purpose: Measures the difference between predicted probability distributions and true labels

Initialization

Kernel initializer: Glorot Uniform (also called Xavier initialization)
Bias initializer: Zeros

Model capacity

Total parameters

The model contains millions of trainable parameters distributed across convolutional and dense layers:

Convolutional layers: Majority of parameters from 64-filter and 128-filter Conv2D layers
Dense layers: 512-unit fully connected layer contributes significant parameter count
Model size: Approximately 100MB across 25 binary shards

Architecture benefits

Deep feature hierarchy

Multiple convolutional layers learn increasingly abstract features, from edges and textures to complex lesion patterns.

Regularization

Dropout layers (0.25 and 0.5) and validation splits prevent overfitting on medical imagery.

Spatial efficiency

Max pooling reduces dimensions while preserving important features, making the model computationally efficient.

Web-ready

TensorFlow.js format enables browser-based inference without server dependencies.

Activation functions

ReLU (Rectified Linear Unit)

Used throughout the network for hidden layers:

f(x) = max(0, x)

Advantages:

Computationally efficient
Reduces vanishing gradient problem
Introduces non-linearity for complex pattern learning

Softmax

Used in the final output layer:

f(x_i) = exp(x_i) / Σ exp(x_j)

Purpose:

Converts raw scores to probabilities
Ensures output sums to 1.0
Enables multi-class classification interpretation

Architecture comparison

This architecture is inspired by VGG-16 but adapted for smaller dermatological images:

Aspect	VGG-16 Standard	This Model
Input size	224×224×3	75×100×3
Convolutional blocks	5 blocks	2 blocks
Max pooling layers	5	2
Dense layers	3	2
Output classes	1000 (ImageNet)	7 (skin lesions)
Dropout	Minimal	0.25, 0.25, 0.5

The reduced input size and simplified architecture make this model more suitable for web deployment while maintaining strong classification performance on skin cancer images.

Get Started

Understanding the Model

Using the API

Technical Details

Resources

Model overview

Input specifications

Architecture layers

Stage 1: Initial convolutional blocks

Stage 2: Pooling and regularization

Stage 3: Deeper feature learning

Stage 4: Classification head

Training configuration

Optimizer

Loss function

Initialization

Model capacity

Total parameters

Architecture benefits

Deep feature hierarchy

Regularization

Spatial efficiency

Web-ready

Activation functions

ReLU (Rectified Linear Unit)

Softmax

Architecture comparison

Next steps

Training process

Classifications

Build docs developers (and LLMs) love

Get Started

Understanding the Model

Using the API

Technical Details

Resources

​Model overview

​Input specifications

​Architecture layers

​Stage 1: Initial convolutional blocks

​Stage 2: Pooling and regularization

​Stage 3: Deeper feature learning

​Stage 4: Classification head

​Training configuration

​Optimizer

​Loss function

​Initialization

​Model capacity

​Total parameters

​Architecture benefits

Deep feature hierarchy

Regularization

Spatial efficiency

Web-ready

​Activation functions

​ReLU (Rectified Linear Unit)

​Softmax

​Architecture comparison

​Next steps

Training process

Classifications

Build docs developers (and LLMs) love

Model overview

Input specifications

Architecture layers

Stage 1: Initial convolutional blocks

Stage 2: Pooling and regularization

Stage 3: Deeper feature learning

Stage 4: Classification head

Training configuration

Optimizer

Loss function

Initialization

Model capacity

Total parameters

Architecture benefits

Activation functions

ReLU (Rectified Linear Unit)

Softmax

Architecture comparison

Next steps