Stable Diffusion Model Implementation

This project implements a Stable Diffusion model from scratch using PyTorch. Stable Diffusion is a latent diffusion model that generates high-quality images from text prompts. The implementation includes all core components: VAE (Variational Autoencoder), CLIP (Contrastive Language-Image Pretraining), U-Net with attention, and the diffusion process.

Project Structure

stable diffusion model/
├── README.txt                           # This file: Project overview and file descriptions
├── stable diffusion model notes.docx    # Detailed notes on the implementation
├── data/                                # Model weights and tokenizer data
│   ├── v1-5-pruned-emaonly.ckpt         # Stable Diffusion v1.5 model weights (main checkpoint)
│   ├── hollie-mengert.ckpt              # Alternative model checkpoint (possibly fine-tuned)
│   ├── tokenizer_vocab.json             # CLIP tokenizer vocabulary file
│   └── tokenizer_merges.txt             # CLIP tokenizer byte-pair encoding merges
├── image/                               # Sample images for testing
│   ├── 1.png                            # Input sample image
│   └── generated_image_1.png            # Generated output image
├── images/                              # Directory for additional generated images (currently empty)
└── sd/                                  # Main implementation directory
    ├── encoder.py                       # VAE Encoder: Compresses images into latent space
    ├── decoder.py                       # VAE Decoder: Reconstructs images from latent space
    ├── attention.py                     # Attention mechanisms: Self-attention and cross-attention
    ├── clip.py                          # CLIP Text Encoder: Processes text prompts into embeddings
    ├── diffusion.py                     # Diffusion process: Forward (noise addition) and reverse (denoising)
    ├── ddpm.py                          # DDPM Sampler: Denoising Diffusion Probabilistic Model sampling algorithm
    ├── model_converter.py               # Model Converter: Utilities for loading and converting model weights
    ├── model_loader.py                  # Model Loader: Functions to preload models from checkpoints
    ├── pipeline.py                      # Pipeline: High-level interface for text-to-image and image-to-image generation
    ├── demo.ipynb                       # Demo Notebook: Jupyter notebook demonstrating model usage
    ├── add_noise.ipynb                  # Noise Addition Notebook: Experiments with noise addition
    └── __pycache__/                     # Python bytecode cache (auto-generated)

Key Components

VAE (Variational Autoencoder):
- Encoder (encoder.py): Compresses 512x512 RGB images to 64x64 latent representations
- Decoder (decoder.py): Reconstructs images from latent space
- Uses residual blocks with group normalization and attention
CLIP (Contrastive Language-Image Pretraining):
- Text Encoder (clip.py): Converts text prompts to embeddings
- Tokenizer (data files): Processes text input into tokens using vocabulary and merges
U-Net with Attention:
- Attention (attention.py): Self-attention for spatial relationships, cross-attention for text-image fusion
- Integrated into encoder and decoder for feature refinement
Diffusion Process:
- Diffusion (diffusion.py): Implements forward process (adding noise) and reverse process (removing noise)
- DDPM (ddpm.py): Sampling algorithm for generating images from noise
Pipeline:
- Pipeline (pipeline.py): Orchestrates text-to-image and image-to-image generation
- Combines all components for end-to-end image generation

Usage

Install required dependencies (PyTorch, transformers, etc.)
Ensure model weights are in the data/ directory
Run demo.ipynb to see the model in action
Use pipeline.py for programmatic generation

Notes

This is an educational implementation, not optimized for production
Model weights are from the official Stable Diffusion v1.5 release
Implementation follows the original paper and Hugging Face diffusers library structure

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
image		image
sd		sd
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Stable Diffusion for Image Generation_compressed.pdf		Stable Diffusion for Image Generation_compressed.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stable Diffusion Model Implementation

Project Structure

Key Components

Usage

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stable Diffusion Model Implementation

Project Structure

Key Components

Usage

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages