AIST5030 Mini Project — Fine-tuning Stable Diffusion 2.1 using BOFT (Butterfly Orthogonal Fine-Tuning) on a DreamBooth-style dataset so the model learns a specific subject (a dog) and generates it in novel contexts.
├── config.yaml # All training / inference hyperparameters
├── pyproject.toml # Python project & dependency specification
├── data/ # Training images (DreamBooth dog2 dataset)
├── src/sd21_boft/
│ ├── config_utils.py # YAML config loading & model path resolution
│ ├── dataset.py # DreamBooth dataset & data loader
│ ├── download_model.py # Download pretrained model from HuggingFace Hub
│ ├── train.py # BOFT fine-tuning training loop
│ ├── inference.py # Generate images with original & fine-tuned model
│ ├── log_utils.py # Logging setup
│ └── plot_utils.py # Training loss curve plotting
├── output/ # Saved BOFT adapter checkpoints & validation images
├── inference_output/ # Inference results (original vs. fine-tuned)
├── logs/ # Training & inference log files, loss CSV
└── plots/ # Loss curve plot
- Python >= 3.12
- NVIDIA GPU with CUDA support
- uv package manager (recommended)
uv syncOr using pip:
pip install -e .Download the DreamBooth dog2 dataset from google/dreambooth and place the images into the data/ directory.
If local_model_dir is set in config.yaml, the pretrained Stable Diffusion 2.1 model will be automatically downloaded from HuggingFace Hub on first run. You can also trigger this manually:
uv run python -m sd21_boft.download_modelAll hyperparameters are centralized in config.yaml. Key settings include:
| Parameter | Description |
|---|---|
pretrained_model |
HuggingFace model ID (sd2-community/stable-diffusion-2-1) |
prompt |
Training prompt containing the unique identifier (a photo of a5o8 dog) |
boft_target_modules |
UNet attention layers to apply BOFT (to_q, to_k, to_v, to_out.0) |
num_train_epochs |
Number of training epochs |
learning_rate |
Learning rate |
validation_prompts |
Prompts used for periodic validation during training |
inference_prompts |
Prompts used for final inference |
uv run python -m sd21_boft.trainThis will:
- Load the pretrained Stable Diffusion 2.1 model
- Apply BOFT adapters to the UNet attention layers
- Fine-tune on the DreamBooth dataset
- Save adapter checkpoints and validation images periodically
- Plot the training loss curve upon completion
uv run python -m sd21_boft.inferenceThis will generate images for each prompt in inference_prompts using both the original model and the fine-tuned model (with the latest checkpoint by default), saving results to inference_output/.
Some non-core-logic modules (dependency specification, logging management, configuration management/loading, loss curve plotting) were written with the assistance of the AI tool GitHub Copilot (Claude Opus 4.6 model).
Besides, the readme and the report were polished and formatted using GitHub Copilot (Claude Opus 4.6 model).
- Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. "High-resolution image synthesis with latent diffusion models." CVPR, 2022.
- Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., and Aberman, K. "DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation." CVPR, 2023.
- Qiu, Z., Liu, W., Feng, H., Xue, Y., Feng, Y., Liu, Z., ... and Schölkopf, B. "Controlling text-to-image diffusion by orthogonal finetuning." NeurIPS, 2023.
- Liu, W., Qiu, Z., Feng, Y., Xiu, Y., Xue, Y., Yu, L., ... and Schölkopf, B. "Parameter-efficient orthogonal finetuning via butterfly factorization." ICLR, 2024.
- The training and inference code is written with reference to: HuggingFace. PEFT BOFT DreamBooth example, 2024.
- The finetuning dataset is from: Google. DreamBooth dataset — dog2, 2023.