Autoencoders, GANs, and Diffusion Models (Ch. 17)

Chapter 17 is a tour of generative deep learning. You’ll start with deterministic autoencoders — networks that learn to compress and reconstruct their input — and progressively add sophistication: stacking layers for richer representations, penalising active neurons for sparse codes, corrupting inputs to learn denoising. The chapter then introduces Variational Autoencoders (VAEs), which map inputs to a learned latent distribution rather than a point, enabling smooth interpolation and generation. The final sections cover Generative Adversarial Networks, including the training instability challenges and remedies, and Denoising Diffusion Probabilistic Models (DDPMs) — the architecture behind modern image synthesis tools.

What you’ll learn

Undercomplete autoencoders and their relationship to PCA
Stacked (deep) autoencoders with encoder and decoder sub-models
Sparse autoencoders: adding an ℓ1 regularisation penalty on codings
Denoising autoencoders: training to reconstruct clean inputs from noisy ones
Variational Autoencoders (VAE): the reparameterisation trick, KL divergence loss
Generating new images by sampling the VAE latent space
GANs: generator, discriminator, adversarial training loop, mode collapse
Progressive GAN, StyleGAN overview
DDPM diffusion models: forward noising process and learned reverse denoising

Key concepts

Autoencoders

An autoencoder consists of an encoder that maps input x to a latent representation (coding) z, and a decoder that reconstructs x̂ from z. The network is trained to minimise reconstruction error (typically MSE). An undercomplete autoencoder forces z to have lower dimensionality than x, so the encoder must learn the most salient features — exactly what PCA does linearly. A stacked autoencoder uses multiple hidden layers in each half, learning hierarchical representations.

Variational Autoencoders

A VAE replaces the deterministic coding with a probability distribution — typically a Gaussian parameterised by a mean vector μ and log-variance vector log σ². During training, a coding is sampled from this distribution and passed to the decoder. The loss has two terms: reconstruction loss (as in a standard autoencoder) plus the KL divergence between the learned distribution and the standard normal prior, which regularises the latent space to be smooth and contiguous. At generation time you sample directly from the prior and decode. The reparameterisation trick makes this differentiable: instead of sampling z ~ N(μ, σ²) directly, you compute z = μ + σ * ε where ε ~ N(0, I). Gradients flow through μ and σ but not through the stochastic ε.

GANs

A GAN consists of a generator that maps random noise to synthetic data, and a discriminator that tries to distinguish real from generated data. They are trained adversarially: the generator tries to fool the discriminator; the discriminator tries not to be fooled. In Nash equilibrium the generator produces perfectly realistic samples. In practice GAN training is notoriously unstable — mode collapse (generator produces limited variety) is a common failure mode — and requires careful architectural choices and training tricks (label smoothing, spectral normalisation, progressive growing).

Diffusion models

DDPMs define a fixed forward process that gradually adds Gaussian noise to a training image over T steps until the image becomes pure noise. A neural network (typically a U-Net) is then trained to reverse this process one step at a time, predicting the noise added at each step. Generation proceeds by starting from random noise and iteratively applying the denoising network T times. Diffusion models have surpassed GANs on image quality benchmarks while being more stable to train.

Code examples

Undercomplete linear autoencoder (PCA-like)

import tensorflow as tf

tf.random.set_seed(42)

encoder = tf.keras.Sequential([tf.keras.layers.Dense(2)])
decoder = tf.keras.Sequential([tf.keras.layers.Dense(3)])
autoencoder = tf.keras.Sequential([encoder, decoder])

autoencoder.compile(loss="mse",
                    optimizer=tf.keras.optimizers.SGD(learning_rate=0.5))
history = autoencoder.fit(X_train, X_train, epochs=500, verbose=False)
codings = encoder.predict(X_train)

Stacked autoencoder on Fashion MNIST

import tensorflow as tf
import numpy as np

tf.random.set_seed(42)

stacked_encoder = tf.keras.Sequential([
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(100, activation="relu"),
    tf.keras.layers.Dense(30, activation="relu"),
])
stacked_decoder = tf.keras.Sequential([
    tf.keras.layers.Dense(100, activation="relu"),
    tf.keras.layers.Dense(28 * 28),
    tf.keras.layers.Reshape([28, 28])
])
stacked_ae = tf.keras.Sequential([stacked_encoder, stacked_decoder])

stacked_ae.compile(loss="mse", optimizer=tf.keras.optimizers.Nadam())
history = stacked_ae.fit(X_train, X_train, epochs=20,
                         validation_data=(X_valid, X_valid))

Variational Autoencoder sampling layer

class Sampling(tf.keras.layers.Layer):
    def call(self, inputs):
        mean, log_var = inputs
        return tf.random.normal(tf.shape(log_var)) * tf.exp(log_var / 2) + mean

codings_size = 10

inputs = tf.keras.layers.Input(shape=[28, 28])
Z = tf.keras.layers.Flatten()(inputs)
Z = tf.keras.layers.Dense(150, activation="relu")(Z)
Z = tf.keras.layers.Dense(100, activation="relu")(Z)
codings_mean = tf.keras.layers.Dense(codings_size)(Z)
codings_log_var = tf.keras.layers.Dense(codings_size)(Z)
codings = Sampling()([codings_mean, codings_log_var])
variational_encoder = tf.keras.Model(
    inputs=[inputs], outputs=[codings_mean, codings_log_var, codings])

Running this notebook

Enable a GPU

Training the VAE and GAN sections is much faster with a GPU. In Colab: Runtime → Change runtime type → GPU.

Open in Colab

Install dependencies

pip install -r requirements.txt

Keras 2 compatibility

This chapter sets TF_USE_LEGACY_KERAS=1 and imports tf_keras because adding custom losses via the Functional API is not yet supported in Keras 3.

Exercises

Exercises ask you to implement a denoising autoencoder that learns to remove Gaussian noise from Fashion MNIST images, and to experiment with the GAN training loop to reduce mode collapse. Solutions are in the notebook.

The KL divergence regularisation term in the VAE loss controls the smoothness of the latent space. Increasing its weight makes the latent space more Gaussian but may worsen reconstruction quality. This trade-off is controlled by the β parameter in β-VAEs.

Part I: The Fundamentals

Part II: Neural Networks & Deep Learning

Autoencoders, GANs, and Diffusion Models (Ch. 17)

What you’ll learn

Key concepts

Autoencoders

Variational Autoencoders

GANs

Diffusion models

Code examples

Undercomplete linear autoencoder (PCA-like)

Stacked autoencoder on Fashion MNIST

Variational Autoencoder sampling layer

Running this notebook

Exercises

Build docs developers (and LLMs) love

Part I: The Fundamentals

Part II: Neural Networks & Deep Learning

Documentation Index

​What you’ll learn

​Key concepts

​Autoencoders

​Variational Autoencoders

​GANs

​Diffusion models

​Code examples

​Undercomplete linear autoencoder (PCA-like)

​Stacked autoencoder on Fashion MNIST

​Variational Autoencoder sampling layer

​Running this notebook

​Exercises

Build docs developers (and LLMs) love

What you’ll learn

Key concepts

Autoencoders

Variational Autoencoders

GANs

Diffusion models

Code examples

Undercomplete linear autoencoder (PCA-like)

Stacked autoencoder on Fashion MNIST

Variational Autoencoder sampling layer

Running this notebook

Exercises