Overview
Diffusion models gradually denoise random noise into structured data through a learned reverse process. Alpamayo R1 implements flow matching, a modern diffusion technique that offers:- Faster sampling: Straight paths in probability space reduce required steps
- Training stability: Direct velocity field prediction avoids noise schedule tuning
- Flexibility: Easy integration with conditional generation
Base Diffusion Interface
All diffusion models inherit fromBaseDiffusion:
base.py:45-89 for the complete interface.
Step Function Protocol
Thestep_fn is a callable that denoises data at each timestep:
Flow Matching
FlowMatching is the primary diffusion implementation in Alpamayo R1:
How Flow Matching Works
Flow matching learns to transform noise into data by predicting velocity fields along optimal transport paths:-
Training: Learn a velocity field
v(x, t)that pushes noise toward data- Start:
x₀ ~ N(0, I)(random noise) - End:
x₁ ~ p_data(real trajectory) - Path:
x_t = t·x₁ + (1-t)·x₀fort ∈ [0, 1] - Objective: Predict
v(x_t, t) = x₁ - x₀
- Start:
-
Sampling: Integrate the learned velocity field from noise to data
- Initialize:
x ~ N(0, I) - Evolve:
dx/dt = v(x, t)fort: 0 → 1 - Result: Realistic trajectory sample
- Initialize:
- Flow Matching for Generative Modeling (Lipman et al., 2023)
- Guided Flows for Generative Modeling and Decision Making (Liu et al., 2023)
Sampling with Euler Integration
Thesample() method implements forward Euler integration:
Usage Example
Here’s a complete example of sampling trajectories:Configuration Options
| Parameter | Type | Default | Description |
|---|---|---|---|
x_dims | list[int] | Required | Dimensions of output data |
int_method | str | ”euler” | Integration method (currently only “euler”) |
num_inference_steps | int | 10 | Number of denoising iterations |
Returning Intermediate Steps
For visualization or analysis, you can retrieve all intermediate denoising steps:Training Flow Matching Models
While the sampling code is shown above, training typically follows this pattern:v = x₁ - x₀ is simply the direction from noise to data.
Inference Speed Considerations
Flow matching enables faster sampling than traditional diffusion models:| Inference Steps | Latency (approx) | Quality |
|---|---|---|
| 1 | ~10ms | Low (single-step approximation) |
| 5 | ~50ms | Medium (good for real-time) |
| 10 | ~100ms | High (recommended default) |
| 20+ | ~200ms+ | Very high (diminishing returns) |
Advanced: Conditional Generation
Flow matching naturally supports conditional generation by including observations in the step function:Best Practices
- Start with 10 inference steps: Good balance of speed and quality
- Use FP16/BF16: Mixed precision can speed up sampling 2x with minimal quality loss
- Batch inference: Process multiple samples in parallel for efficiency
- Cache features: If generating multiple samples for the same scene, encode observations once
- Compile models: Use
torch.compile()for faster step function execution
Comparison to Other Diffusion Methods
| Method | Training | Sampling Speed | Implementation Complexity |
|---|---|---|---|
| DDPM | Stable | Slow (100+ steps) | Medium |
| DDIM | Stable | Medium (20-50 steps) | Medium |
| Flow Matching | Very stable | Fast (5-10 steps) | Low |
References
- Lipman et al. (2023). Flow Matching for Generative Modeling
- Liu et al. (2023). Guided Flows for Generative Modeling and Decision Making
diffusion/base.py and diffusion/flow_matching.py for full implementation details.