
What is Stable Diffusion from Scratch?
This project implements denoising diffusion probabilistic models (DDPM) and DDIM samplers from scratch in PyTorch, training them on MNIST and CIFAR-10 datasets. If you want to understand how diffusion models work step-by-step—without relying on Hugging Face or the diffusers library—this is the perfect starting point. By building these models yourself, you’ll gain deep insights into:- How the forward diffusion process gradually adds noise to images
- How the reverse process learns to denoise and generate new samples
- The mathematics behind beta schedules and timestep sampling
- The architecture of U-Nets with time embeddings and self-attention
- Trade-offs between DDPM and DDIM sampling methods
Key features
MNIST DDPM
U-Net architecture with cosine noise schedule, sinusoidal time embeddings, and self-attention in the bottleneck
CIFAR-10 DDPM
Deeper U-Net with multi-resolution self-attention, dropout regularization, and exponential moving average (EMA) weights
DDIM samplers
Fast deterministic sampling with configurable step counts (10-1000), enabling speed/quality trade-off analysis
Training utilities
Reproducible training scripts with early stopping, loss curves, sample visualization, and timestep analysis
Architecture overview
The core models use a U-Net architecture with residual blocks, group normalization, and time conditioning:The model learns to predict the noise that was added to an image at a given timestep, not the clean image directly. This is a key insight from the DDPM paper.
Design philosophy
This implementation is intentionally kept minimal and explicit:- No hidden training frameworks or complex configuration systems
- Every modeling decision is visible in the Python files
- Clear separation between models, training, and utilities
- Easy to modify and adapt to your own datasets
Core models
diffusion.py and diffusion_cifar.py contain the U-Net architectures and diffusion processesTraining scripts
train_diffusion.py and train_diffusion_cifar.py handle the training loops with early stoppingUtilities
DDIM comparisons, interpolation experiments, and timestep analysis scripts
What you’ll learn
By working through this project, you’ll understand:Forward diffusion process
How to gradually add Gaussian noise to images using a predefined beta schedule
U-Net architecture
How to build an encoder-decoder with skip connections, time embeddings, and self-attention
Sampling methods
The difference between DDPM (1000 steps) and DDIM (as few as 10 steps) for generation
Theoretical foundation
This implementation is based on two foundational papers:Denoising Diffusion Probabilistic Models (Ho et al., NeurIPS 2020)
Denoising Diffusion Probabilistic Models (Ho et al., NeurIPS 2020)
Introduced the DDPM framework with a fixed forward process and learned reverse process. The model is trained to predict the noise added at each timestep using a simple MSE loss.Key contributions:
- Simplified training objective (predict noise, not images)
- Cosine beta schedule for better performance
- Connection to score-based generative models
Denoising Diffusion Implicit Models (Song et al., ICLR 2021)
Denoising Diffusion Implicit Models (Song et al., ICLR 2021)
Extended DDPM with a deterministic sampling process that allows skipping timesteps without retraining. This enables 10-100x faster generation with minimal quality loss.Key contributions:
- Non-Markovian forward process
- Deterministic sampling (η=0) for reproducibility
- Flexible step counts for speed/quality trade-offs
Next steps
Quick start
Train your first MNIST diffusion model in under 10 minutes
Installation
Set up your environment and install dependencies
