oft by zqiu24

Research paper on orthogonal finetuning for text-to-image diffusion models

Created 2 years ago

298 stars

Top 89.2% on SourcePulse

View on GitHub

2 Experts Love This Project

Jiaming Song

Chief Scientist at Luma AI

Alex Yu

Research Scientist at OpenAI; Cofounder of Luma AI

Project Summary

This repository provides the official implementation of Orthogonal Finetuning (OFT), a method for adapting large text-to-image diffusion models to downstream tasks like subject-driven generation and controllable generation. It aims to preserve the model's semantic generation ability by maintaining hyperspherical energy, outperforming existing methods in quality and convergence speed.

How It Works

OFT is a principled finetuning approach that provably preserves hyperspherical energy, which is crucial for maintaining the semantic generation capabilities of diffusion models. It achieves this by constraining neuron relationships on a unit hypersphere. The Constrained Orthogonal Finetuning (COFT) variant adds a radius constraint for improved stability. The method is integrated into Hugging Face PEFT.

Quick Start & Requirements

Installation: Clone the repository and create a conda environment using environment.yml.
Data: Download preprocessed data using scripts in the scripts folder (e.g., scripts/dataset_setup_control_deepfashion.sh, scripts/dataset_setup_db_dreambooth.sh). Requires agreeing to third-party licenses.
Model Weights: Download v1-5-pruned.ckpt and place it in the models directory.
Usage: Key hyperparameters include r (number of blocks) and eps (eps-deviation for COFT).
Resources: Requires significant storage for datasets and model weights. Training involves GPU acceleration.
Docs: Hugging Face PEFT Doc

Highlighted Details

Implemented for both controllable generation (ControlNet-like) and subject-driven generation (Dreambooth-like) tasks.
Offers a post-training magnitude fitting step (train_with_norm.py) to improve neuron magnitude.
Includes evaluation scripts for various tasks (e.g., eval_landmark.py, eval_canny.py).
A toy experiment demonstrates the importance of angular information.

Maintenance & Community

Initial commit on June 23, 2023.
TODO list includes a faster OFT version and more applications.
Builds upon Lora, ControlNet, Diffusers, and OPT projects.

Licensing & Compatibility

The repository itself is not explicitly licensed in the README. However, it depends on and requires data from other projects which have their own licenses. Users must agree to these third-party licenses.

Limitations & Caveats

The README mentions that with more blocks (r), finetuning results can worsen.
Evaluation for segmentation map-to-image tasks requires installing the Segformer repository.
The project is actively under development with pending features and improvements.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days