A Denoising Diffusion Implicit Model trained from scratch on 30,000 faces — no pretrained weights, no diffusers library. Pure PyTorch.
Demo features:
- ✨ Generate — sample new faces from pure noise with adjustable DDIM steps
- 🎞️ Trajectory — animated GIF showing the full denoising path (noise → face)
- 🔀 Interpolate — spherical linear interpolation (slerp) between two faces
- 📖 How it works — full architecture & training breakdown at the bottom of the page
| Architecture | U-Net with sinusoidal time embeddings + multi-head self-attention |
| Channels | [64, 128, 256, 256] |
| Parameters | 25.6M |
| Dataset | CelebA-HQ — 30,000 aligned faces at 64×64 |
| Training | 100 epochs, ~40 hours, Apple Silicon MPS (no cloud GPU) |
| Sampler | DDIM — 20 steps vs DDPM 1000 steps (50× speedup) |
| Noise schedule | Linear β: 1×10⁻⁴ → 0.02, T = 1000 |
| Inference weights | EMA (exponential moving average of training weights) |
Input x_t (noisy image) + timestep t
│
┌───────▼────────┐
│ Time Embedding │ Sinusoidal → MLP → injected at every ResBlock
└───────┬────────┘
│
┌───────▼────────┐
│ U-Net │ 4 resolution levels
│ │ Self-attention at 8×8 and 16×16
│ Down → Mid │ GroupNorm + SiLU throughout
│ → Up │ Zero-init output conv (identity at init)
└───────┬────────┘
│
predicted ε (noise)
Training objective: L = ||ε − ε_θ(√ᾱₜ x₀ + √(1−ᾱₜ) ε, t)||²
minidiffusion/
├── models/
│ ├── attention.py # Multi-head self-attention (2D spatial)
│ ├── unet.py # Full U-Net with time embeddings
│ └── diffusion.py # DDPM training + DDIM sampling + EMA + AdamW
├── utils/
│ ├── dataset.py # CelebA-HQ dataloader
│ └── visualize.py # Trajectory GIF, interpolation grid
├── train.py # Training loop — W&B logging, auto-resume
├── sample.py # Inference — grid, trajectory, interpolation, compare
├── app.py # Gradio demo UI
└── config.py # All hyperparameters
Every component is hand-written — no diffusers, no guided-diffusion, no pretrained encoders:
attention.py · unet.py · diffusion.py · dataset.py · train.py
Notable engineering decisions:
- Custom CPU-resident AdamW — fixes a MPS NaN bug in PyTorch 2.3.1 where zero-grad params corrupt optimizer state, while also saving ~2GB of GPU memory
- EMA shadow on CPU — keeps a smoothed copy of weights off the GPU, saving another ~1GB
- MPS-safe DDIM indexing — tensor indexing with MPS buffers returns garbage in some PyTorch builds; fixed by using Python ints throughout the sampling loop
git clone https://github.com/Gh-Novel/DDIM_Image_Generation.git
cd DDIM_Image_Generation
pip install -r requirements.txt
# Run the Gradio demo (uses bundled checkpoint)
python app.py
# Or generate samples directly
python sample.py --ckpt checkpoints/stage-64_best.pt --num 16 --steps 50
# Train from scratch on your own data
python train.py --image-size 64 --epochs 100 --run-name my-run
