Convolutional neural networks are the workhorse of modern computer vision. Chapter 14 builds from the ground up — starting with a singleDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/ageron/handson-ml3/llms.txt
Use this file to discover all available pages before exploring further.
Conv2D layer applied to real images — through pooling, feature maps, and classic architectures, all the way to state-of-the-art pretrained models and practical transfer learning. You’ll also get an overview of object detection (YOLO, SSD) and semantic segmentation (FCN, U-Net, DeepLab).
What you’ll learn
- How convolutional layers work: filters, feature maps, strides, and padding
- Pooling layers:
MaxPooling2D,AvgPooling2D, global variants - Classic CNN architectures: LeNet-5, AlexNet, VGG, GoogLeNet/Inception
- Modern architectures: ResNet50, EfficientNet, MobileNet, DenseNet
- Transfer learning: freezing base layers and fine-tuning
- Data augmentation with Keras preprocessing layers:
RandomFlip,RandomRotation,RandomCrop,RandomContrast - Depthwise separable convolutions (
DepthwiseConv2D) - Object detection concepts: bounding boxes, YOLO, and SSD
- Semantic and instance segmentation overview
Key concepts
Convolutional layers
AConv2D layer slides one or more small kernels across the input image, computing the dot product between the kernel and a local patch at each location. This produces a feature map that highlights wherever the corresponding pattern appears. With padding="same" the spatial dimensions are preserved; padding="valid" (the default) shrinks them. Multiple filters in the same layer each learn to detect a different pattern — edges, textures, etc. — and their feature maps are stacked along the channel dimension.
Pooling
Max pooling takes the maximum value in each receptive field, achieving spatial invariance to small translations and reducing the spatial size by a factor equal to the stride (typically 2). Global average pooling reduces each feature map to a single scalar, which dramatically cuts parameter count in the fully-connected head.Transfer learning
Pre-training large networks on ImageNet (1.2M images, 1000 classes) produces feature representations that transfer remarkably well to other visual tasks. The standard recipe is:- Load a pretrained model (e.g.
ResNet50) withinclude_top=False. - Freeze the base layers (
base_model.trainable = False). - Add a custom head (pooling + Dense + softmax).
- Train the head for a few epochs.
- Fine-tune: unfreeze the upper base layers and continue training with a low learning rate.
Data augmentation
Augmentation artificially expands the training set by randomly transforming images. Keras preprocessing layers handle this inside the model graph, so augmentations are only applied during training and are consistent between CPU and GPU. Common layers includeRandomFlip, RandomRotation, RandomZoom, and RandomContrast.
Code examples
Applying a Conv2D layer
Transfer learning with EfficientNet
Data augmentation pipeline
Fine-tuning the upper layers
Running this notebook
Enable a GPU
Chapter 14 trains CNNs and runs large pretrained models — a GPU is strongly recommended. In Colab: Runtime → Change runtime type → GPU.
Open in Colab