Skip to content

QY02/AIST5030-MiniProject

Repository files navigation

Stable Diffusion 2.1 Fine-tuning with BOFT (DreamBooth)

AIST5030 Mini Project — Fine-tuning Stable Diffusion 2.1 using BOFT (Butterfly Orthogonal Fine-Tuning) on a DreamBooth-style dataset so the model learns a specific subject (a dog) and generates it in novel contexts.

Project Structure

├── config.yaml              # All training / inference hyperparameters
├── pyproject.toml            # Python project & dependency specification
├── data/                     # Training images (DreamBooth dog2 dataset)
├── src/sd21_boft/
│   ├── config_utils.py       # YAML config loading & model path resolution
│   ├── dataset.py            # DreamBooth dataset & data loader
│   ├── download_model.py     # Download pretrained model from HuggingFace Hub
│   ├── train.py              # BOFT fine-tuning training loop
│   ├── inference.py          # Generate images with original & fine-tuned model
│   ├── log_utils.py          # Logging setup
│   └── plot_utils.py         # Training loss curve plotting
├── output/                   # Saved BOFT adapter checkpoints & validation images
├── inference_output/         # Inference results (original vs. fine-tuned)
├── logs/                     # Training & inference log files, loss CSV
└── plots/                    # Loss curve plot

Setup

Prerequisites

  • Python >= 3.12
  • NVIDIA GPU with CUDA support
  • uv package manager (recommended)

Install Dependencies

uv sync

Or using pip:

pip install -e .

Prepare Data

Download the DreamBooth dog2 dataset from google/dreambooth and place the images into the data/ directory.

Download Pretrained Model (Optional)

If local_model_dir is set in config.yaml, the pretrained Stable Diffusion 2.1 model will be automatically downloaded from HuggingFace Hub on first run. You can also trigger this manually:

uv run python -m sd21_boft.download_model

Usage

Configuration

All hyperparameters are centralized in config.yaml. Key settings include:

Parameter Description
pretrained_model HuggingFace model ID (sd2-community/stable-diffusion-2-1)
prompt Training prompt containing the unique identifier (a photo of a5o8 dog)
boft_target_modules UNet attention layers to apply BOFT (to_q, to_k, to_v, to_out.0)
num_train_epochs Number of training epochs
learning_rate Learning rate
validation_prompts Prompts used for periodic validation during training
inference_prompts Prompts used for final inference

Training

uv run python -m sd21_boft.train

This will:

  1. Load the pretrained Stable Diffusion 2.1 model
  2. Apply BOFT adapters to the UNet attention layers
  3. Fine-tune on the DreamBooth dataset
  4. Save adapter checkpoints and validation images periodically
  5. Plot the training loss curve upon completion

Inference

uv run python -m sd21_boft.inference

This will generate images for each prompt in inference_prompts using both the original model and the fine-tuned model (with the latest checkpoint by default), saving results to inference_output/.

AI Usage

Some non-core-logic modules (dependency specification, logging management, configuration management/loading, loss curve plotting) were written with the assistance of the AI tool GitHub Copilot (Claude Opus 4.6 model).

Besides, the readme and the report were polished and formatted using GitHub Copilot (Claude Opus 4.6 model).

References

  • Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. "High-resolution image synthesis with latent diffusion models." CVPR, 2022.
  • Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., and Aberman, K. "DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation." CVPR, 2023.
  • Qiu, Z., Liu, W., Feng, H., Xue, Y., Feng, Y., Liu, Z., ... and Schölkopf, B. "Controlling text-to-image diffusion by orthogonal finetuning." NeurIPS, 2023.
  • Liu, W., Qiu, Z., Feng, Y., Xiu, Y., Xue, Y., Yu, L., ... and Schölkopf, B. "Parameter-efficient orthogonal finetuning via butterfly factorization." ICLR, 2024.
  • The training and inference code is written with reference to: HuggingFace. PEFT BOFT DreamBooth example, 2024.
  • The finetuning dataset is from: Google. DreamBooth dataset — dog2, 2023.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors