Skip to content

Emad-Almagedy/Stable-Diffusion

Repository files navigation

Stable Diffusion Model Implementation

This project implements a Stable Diffusion model from scratch using PyTorch. Stable Diffusion is a latent diffusion model that generates high-quality images from text prompts. The implementation includes all core components: VAE (Variational Autoencoder), CLIP (Contrastive Language-Image Pretraining), U-Net with attention, and the diffusion process.

Project Structure

stable diffusion model/
├── README.txt                           # This file: Project overview and file descriptions
├── stable diffusion model notes.docx    # Detailed notes on the implementation
├── data/                                # Model weights and tokenizer data
│   ├── v1-5-pruned-emaonly.ckpt         # Stable Diffusion v1.5 model weights (main checkpoint)
│   ├── hollie-mengert.ckpt              # Alternative model checkpoint (possibly fine-tuned)
│   ├── tokenizer_vocab.json             # CLIP tokenizer vocabulary file
│   └── tokenizer_merges.txt             # CLIP tokenizer byte-pair encoding merges
├── image/                               # Sample images for testing
│   ├── 1.png                            # Input sample image
│   └── generated_image_1.png            # Generated output image
├── images/                              # Directory for additional generated images (currently empty)
└── sd/                                  # Main implementation directory
    ├── encoder.py                       # VAE Encoder: Compresses images into latent space
    ├── decoder.py                       # VAE Decoder: Reconstructs images from latent space
    ├── attention.py                     # Attention mechanisms: Self-attention and cross-attention
    ├── clip.py                          # CLIP Text Encoder: Processes text prompts into embeddings
    ├── diffusion.py                     # Diffusion process: Forward (noise addition) and reverse (denoising)
    ├── ddpm.py                          # DDPM Sampler: Denoising Diffusion Probabilistic Model sampling algorithm
    ├── model_converter.py               # Model Converter: Utilities for loading and converting model weights
    ├── model_loader.py                  # Model Loader: Functions to preload models from checkpoints
    ├── pipeline.py                      # Pipeline: High-level interface for text-to-image and image-to-image generation
    ├── demo.ipynb                       # Demo Notebook: Jupyter notebook demonstrating model usage
    ├── add_noise.ipynb                  # Noise Addition Notebook: Experiments with noise addition
    └── __pycache__/                     # Python bytecode cache (auto-generated)

Key Components

  1. VAE (Variational Autoencoder):

    • Encoder (encoder.py): Compresses 512x512 RGB images to 64x64 latent representations
    • Decoder (decoder.py): Reconstructs images from latent space
    • Uses residual blocks with group normalization and attention
  2. CLIP (Contrastive Language-Image Pretraining):

    • Text Encoder (clip.py): Converts text prompts to embeddings
    • Tokenizer (data files): Processes text input into tokens using vocabulary and merges
  3. U-Net with Attention:

    • Attention (attention.py): Self-attention for spatial relationships, cross-attention for text-image fusion
    • Integrated into encoder and decoder for feature refinement
  4. Diffusion Process:

    • Diffusion (diffusion.py): Implements forward process (adding noise) and reverse process (removing noise)
    • DDPM (ddpm.py): Sampling algorithm for generating images from noise
  5. Pipeline:

    • Pipeline (pipeline.py): Orchestrates text-to-image and image-to-image generation
    • Combines all components for end-to-end image generation

Usage

  1. Install required dependencies (PyTorch, transformers, etc.)
  2. Ensure model weights are in the data/ directory
  3. Run demo.ipynb to see the model in action
  4. Use pipeline.py for programmatic generation

Notes

  • This is an educational implementation, not optimized for production
  • Model weights are from the official Stable Diffusion v1.5 release
  • Implementation follows the original paper and Hugging Face diffusers library structure

About

PyTorch implementation of Stable Diffusion from scratch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors