Chapter 17 is a tour of generative deep learning. You’ll start with deterministic autoencoders — networks that learn to compress and reconstruct their input — and progressively add sophistication: stacking layers for richer representations, penalising active neurons for sparse codes, corrupting inputs to learn denoising. The chapter then introduces Variational Autoencoders (VAEs), which map inputs to a learned latent distribution rather than a point, enabling smooth interpolation and generation. The final sections cover Generative Adversarial Networks, including the training instability challenges and remedies, and Denoising Diffusion Probabilistic Models (DDPMs) — the architecture behind modern image synthesis tools.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ageron/handson-ml3/llms.txt
Use this file to discover all available pages before exploring further.
What you’ll learn
- Undercomplete autoencoders and their relationship to PCA
- Stacked (deep) autoencoders with encoder and decoder sub-models
- Sparse autoencoders: adding an ℓ1 regularisation penalty on codings
- Denoising autoencoders: training to reconstruct clean inputs from noisy ones
- Variational Autoencoders (VAE): the reparameterisation trick, KL divergence loss
- Generating new images by sampling the VAE latent space
- GANs: generator, discriminator, adversarial training loop, mode collapse
- Progressive GAN, StyleGAN overview
- DDPM diffusion models: forward noising process and learned reverse denoising
Key concepts
Autoencoders
An autoencoder consists of an encoder that maps inputx to a latent representation (coding) z, and a decoder that reconstructs x̂ from z. The network is trained to minimise reconstruction error (typically MSE). An undercomplete autoencoder forces z to have lower dimensionality than x, so the encoder must learn the most salient features — exactly what PCA does linearly. A stacked autoencoder uses multiple hidden layers in each half, learning hierarchical representations.
Variational Autoencoders
A VAE replaces the deterministic coding with a probability distribution — typically a Gaussian parameterised by a mean vectorμ and log-variance vector log σ². During training, a coding is sampled from this distribution and passed to the decoder. The loss has two terms: reconstruction loss (as in a standard autoencoder) plus the KL divergence between the learned distribution and the standard normal prior, which regularises the latent space to be smooth and contiguous. At generation time you sample directly from the prior and decode.
The reparameterisation trick makes this differentiable: instead of sampling z ~ N(μ, σ²) directly, you compute z = μ + σ * ε where ε ~ N(0, I). Gradients flow through μ and σ but not through the stochastic ε.
GANs
A GAN consists of a generator that maps random noise to synthetic data, and a discriminator that tries to distinguish real from generated data. They are trained adversarially: the generator tries to fool the discriminator; the discriminator tries not to be fooled. In Nash equilibrium the generator produces perfectly realistic samples. In practice GAN training is notoriously unstable — mode collapse (generator produces limited variety) is a common failure mode — and requires careful architectural choices and training tricks (label smoothing, spectral normalisation, progressive growing).Diffusion models
DDPMs define a fixed forward process that gradually adds Gaussian noise to a training image overT steps until the image becomes pure noise. A neural network (typically a U-Net) is then trained to reverse this process one step at a time, predicting the noise added at each step. Generation proceeds by starting from random noise and iteratively applying the denoising network T times. Diffusion models have surpassed GANs on image quality benchmarks while being more stable to train.
Code examples
Undercomplete linear autoencoder (PCA-like)
Stacked autoencoder on Fashion MNIST
Variational Autoencoder sampling layer
Running this notebook
Enable a GPU
Training the VAE and GAN sections is much faster with a GPU. In Colab: Runtime → Change runtime type → GPU.
Open in Colab