Image Classification with CNNs

Convolutional Neural Networks (CNNs) are the backbone of modern image classification. Unlike fully-connected networks, CNNs exploit spatial structure through weight sharing and local connectivity, making them far more parameter-efficient on image data.

CNN architecture

A typical CNN consists of three stages:

1. Convolutional layers

A convolution slides a small filter (kernel) across the input, computing a dot product at each position:

\text{output}[i,j] = \sum_{m,n} \text{input}[i+m,\, j+n] \cdot \text{kernel}[m,n]

Key hyperparameters: kernel size, stride, padding, number of filters (channels).

2. Pooling layers

Pooling reduces spatial dimensions while retaining dominant features. Max pooling selects the largest value in each local window, making representations more robust to small translations.

3. Fully connected layers

After several conv+pool blocks, the feature maps are flattened and passed through standard dense layers to produce class logits.

Input (H×W×3)
  → Conv + ReLU  → Feature maps
  → MaxPool      → Reduced maps
  → Conv + ReLU
  → MaxPool
  → Flatten
  → Linear + ReLU
  → Linear (num_classes)
  → Softmax

Training a CNN with PyTorch

Prepare data loaders

Use torchvision.datasets and DataLoader to load and batch your images with on-the-fly augmentations.

import torchvision.transforms as transforms
from torchvision import datasets
from torch.utils.data import DataLoader

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

train_dataset = datasets.ImageFolder('data/train', transform=transform)
train_loader  = DataLoader(train_dataset, batch_size=32, shuffle=True)

Define the model

Build a custom CNN or load a pretrained architecture from torchvision.models.

import torch.nn as nn

class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
        )
        self.classifier = nn.Sequential(
            nn.Linear(64 * 56 * 56, 512),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(512, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        return self.classifier(x)

Set loss and optimizer

Cross-entropy loss is standard for multi-class classification. Adam is a reliable default optimizer.

import torch.nn as nn
import torch.optim as optim

model     = SimpleCNN(num_classes=10).cuda()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

Run the training loop

Iterate over epochs, perform forward and backward passes, and update weights.

import torch

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:
        images, labels = images.cuda(), labels.cuda()

        optimizer.zero_grad()
        outputs = model(images)
        loss    = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    print(f"Epoch {epoch+1}, Loss: {running_loss/len(train_loader):.4f}")

Transfer learning with pretrained models

Training from scratch requires large datasets. Transfer learning repurposes a model pretrained on ImageNet (1.2 M images, 1000 classes) by replacing only the final classification head.

import torch
import torch.nn as nn
import torchvision.transforms as transforms
from torchvision import datasets, models

# Load pretrained model
model = models.resnet18(pretrained=True)
model.fc = nn.Linear(512, num_classes)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Freeze the backbone layers initially (param.requires_grad = False for all but model.fc), train the head for a few epochs, then unfreeze and fine-tune end-to-end with a lower learning rate (e.g., 1e-5).

Common pretrained architectures

Model	Top-1 accuracy (ImageNet)	Parameters	Notes
ResNet-18	69.8%	11 M	Fast, good baseline
ResNet-50	76.1%	25 M	Strong general-purpose model
VGG-16	71.6%	138 M	Simple architecture, large
EfficientNet-B0	77.1%	5.3 M	Best accuracy/size trade-off
MobileNetV3	74.0%	5.4 M	Optimized for edge devices

Evaluation

After training, evaluate on a held-out test set:

model.eval()
correct = total = 0
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.cuda(), labels.cuda()
        outputs = model(images)
        _, predicted = outputs.max(1)
        total   += labels.size(0)
        correct += predicted.eq(labels).sum().item()

print(f"Test accuracy: {100 * correct / total:.2f}%")

Resources

Exercise E05: CNN Training

Hands-on exercise: train a CNN from scratch in Google Colab.

VisionColab: Image Classification

Collection of CNN examples and notebooks from the course.

Video: CNN Lecture (2021)

Recorded lecture covering CNN architecture and training.

Video: Complementary CNN

Additional video resource on convolutional neural networks.

Get Started

Computational Geometry

Deep Learning

Ethics & AI

Resources

CNN architecture

1. Convolutional layers

2. Pooling layers

3. Fully connected layers

Training a CNN with PyTorch

Transfer learning with pretrained models

Common pretrained architectures

Evaluation

Resources

Exercise E05: CNN Training

VisionColab: Image Classification

Video: CNN Lecture (2021)

Video: Complementary CNN

Build docs developers (and LLMs) love

Get Started

Computational Geometry

Deep Learning

Ethics & AI

Resources

Documentation Index

​CNN architecture

​1. Convolutional layers

​2. Pooling layers

​3. Fully connected layers

​Training a CNN with PyTorch

​Transfer learning with pretrained models

​Common pretrained architectures

​Evaluation

​Resources

Exercise E05: CNN Training

VisionColab: Image Classification

Video: CNN Lecture (2021)

Video: Complementary CNN

Build docs developers (and LLMs) love

CNN architecture

1. Convolutional layers

2. Pooling layers

3. Fully connected layers

Training a CNN with PyTorch

Transfer learning with pretrained models

Common pretrained architectures

Evaluation

Resources