oft  by zqiu24

Research paper on orthogonal finetuning for text-to-image diffusion models

created 2 years ago
292 stars

Top 91.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation of Orthogonal Finetuning (OFT), a method for adapting large text-to-image diffusion models to downstream tasks like subject-driven generation and controllable generation. It aims to preserve the model's semantic generation ability by maintaining hyperspherical energy, outperforming existing methods in quality and convergence speed.

How It Works

OFT is a principled finetuning approach that provably preserves hyperspherical energy, which is crucial for maintaining the semantic generation capabilities of diffusion models. It achieves this by constraining neuron relationships on a unit hypersphere. The Constrained Orthogonal Finetuning (COFT) variant adds a radius constraint for improved stability. The method is integrated into Hugging Face PEFT.

Quick Start & Requirements

  • Installation: Clone the repository and create a conda environment using environment.yml.
  • Data: Download preprocessed data using scripts in the scripts folder (e.g., scripts/dataset_setup_control_deepfashion.sh, scripts/dataset_setup_db_dreambooth.sh). Requires agreeing to third-party licenses.
  • Model Weights: Download v1-5-pruned.ckpt and place it in the models directory.
  • Usage: Key hyperparameters include r (number of blocks) and eps (eps-deviation for COFT).
  • Resources: Requires significant storage for datasets and model weights. Training involves GPU acceleration.
  • Docs: Hugging Face PEFT Doc

Highlighted Details

  • Implemented for both controllable generation (ControlNet-like) and subject-driven generation (Dreambooth-like) tasks.
  • Offers a post-training magnitude fitting step (train_with_norm.py) to improve neuron magnitude.
  • Includes evaluation scripts for various tasks (e.g., eval_landmark.py, eval_canny.py).
  • A toy experiment demonstrates the importance of angular information.

Maintenance & Community

  • Initial commit on June 23, 2023.
  • TODO list includes a faster OFT version and more applications.
  • Builds upon Lora, ControlNet, Diffusers, and OPT projects.

Licensing & Compatibility

  • The repository itself is not explicitly licensed in the README. However, it depends on and requires data from other projects which have their own licenses. Users must agree to these third-party licenses.

Limitations & Caveats

  • The README mentions that with more blocks (r), finetuning results can worsen.
  • Evaluation for segmentation map-to-image tasks requires installing the Segformer repository.
  • The project is actively under development with pending features and improvements.
Health Check
Last commit

9 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.