GAN fundamentals
A GAN consists of two networks trained in opposition:- Generator : maps random noise to a synthetic image .
- Discriminator : classifies inputs as real (from the dataset) or fake (from ).
Training objective (minimax game)
At the Nash equilibrium, produces samples indistinguishable from real data and outputs everywhere.GAN training dynamics
Train the discriminator
Sample a minibatch of real images and a minibatch of generated images. Update to maximize the log-likelihood of correctly classifying both.
Basic GAN in PyTorch
GAN variants
DCGAN (Deep Convolutional GAN)
DCGAN (Deep Convolutional GAN)
Replaces linear layers with transposed convolutions (generator) and strided convolutions (discriminator). Uses batch normalization throughout. DCGAN produces sharper images and trains more stably than the original GAN.Key architectural guidelines:
- No pooling layers — use strided convolutions for downsampling.
- Batch norm in both and (except the output of and input of ).
- ReLU activations in ; LeakyReLU in .
Conditional GAN (cGAN)
Conditional GAN (cGAN)
Conditions both and on a class label (or any auxiliary information). The generator produces images of a specific class: . Useful for class-conditional image synthesis and data augmentation.
Pix2Pix and CycleGAN
Pix2Pix and CycleGAN
Image-to-image translation GANs. Pix2Pix requires paired training images (input→output). CycleGAN learns the translation without paired data using a cycle-consistency loss.
Stable Diffusion and diffusion models
Diffusion models have surpassed GANs in image quality and diversity. They operate by learning to reverse a gradual noising process.The diffusion process
Forward process: add Gaussian noise over timesteps until the image is pure noise: Reverse process: a neural network learns to predict the noise added at each step, enabling denoising:Latent Diffusion (Stable Diffusion)
Running diffusion in pixel space is computationally expensive. Latent Diffusion Models (LDM) compress images to a low-dimensional latent space using a pretrained VAE, then run diffusion there:- Encode:
- Diffuse/denoise in latent space: LDM
- Decode:
Applications
| Application | Method | Notes |
|---|---|---|
| Synthetic data augmentation | Conditional GAN / diffusion | Generate rare or minority-class examples |
| Style transfer | CycleGAN, neural style | Transform image appearance |
| Super-resolution | SRGAN, ESRGAN | Upsample low-resolution images |
| Inpainting | LaMa, diffusion | Fill masked regions |
| Text-to-image | Stable Diffusion, DALL-E | Generate from text prompts |
Resources
Exercise E09: Image Generation with GAN
Hands-on exercise: train a GAN to generate images from a dataset.
VisionColab: GAN Examples
Collection of GAN examples from the course repository.
Diffusion Models Blog
Accessible overview of diffusion models, DDPM, and latent diffusion.
Video: UNet, GAN & Anomaly Detection
Recorded lecture covering GANs alongside UNet and anomaly detection.
