Research paper on orthogonal finetuning for text-to-image diffusion models
Top 91.4% on sourcepulse
This repository provides the official implementation of Orthogonal Finetuning (OFT), a method for adapting large text-to-image diffusion models to downstream tasks like subject-driven generation and controllable generation. It aims to preserve the model's semantic generation ability by maintaining hyperspherical energy, outperforming existing methods in quality and convergence speed.
How It Works
OFT is a principled finetuning approach that provably preserves hyperspherical energy, which is crucial for maintaining the semantic generation capabilities of diffusion models. It achieves this by constraining neuron relationships on a unit hypersphere. The Constrained Orthogonal Finetuning (COFT) variant adds a radius constraint for improved stability. The method is integrated into Hugging Face PEFT.
Quick Start & Requirements
environment.yml
.scripts
folder (e.g., scripts/dataset_setup_control_deepfashion.sh
, scripts/dataset_setup_db_dreambooth.sh
). Requires agreeing to third-party licenses.v1-5-pruned.ckpt
and place it in the models
directory.r
(number of blocks) and eps
(eps-deviation for COFT).Highlighted Details
train_with_norm.py
) to improve neuron magnitude.eval_landmark.py
, eval_canny.py
).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
r
), finetuning results can worsen.9 months ago
1+ week