Real-ESRGAN uses a two-stage training process to achieve high-quality image super-resolution. This approach combines the stability of L1 loss with the perceptual quality improvements from GAN training.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/xinntao/Real-ESRGAN/llms.txt
Use this file to discover all available pages before exploring further.
Training Process
The training is divided into two distinct stages that share the same data synthesis process and training pipeline, but differ in their loss functions:Stage 1: Train Real-ESRNet
Train Real-ESRNet using L1 loss from a pre-trained ESRGAN model. This stage provides a stable foundation and prevents mode collapse.
- Uses L1 loss only
- Starts from pre-trained ESRGAN weights
- Results in a stable base model
Why Two Stages?
This two-stage approach offers several advantages:- Stability: Starting with L1 loss provides a stable baseline before introducing adversarial training
- Quality: The combination of losses in stage 2 improves perceptual quality while maintaining fidelity
- Convergence: Pre-training with L1 loss helps the GAN training converge more reliably
Training Requirements
Hardware
- Multiple GPUs recommended (examples use 4 GPUs)
- Single GPU training is supported but slower
- Adequate disk space for datasets
Datasets
Real-ESRGAN is trained on:- DF2K: Combination of DIV2K and Flickr2K datasets
- OST: OpenImages subset for training
The degradation process simulates real-world image degradation, including blur, noise, compression artifacts, and downsampling.
Training Modes
Debug Mode
Test your configuration before full training:Full Training
Run the complete training process:--auto_resume flag automatically resumes training from the last checkpoint if interrupted.
Next Steps
Dataset Preparation
Learn how to prepare and process training datasets
Train Real-ESRNet
Start with stage 1 training
Train Real-ESRGAN
Complete stage 2 for the final model
Fine-tuning
Adapt the model to your custom dataset