custom-diffusion by adobe-research

Text-to-image fine-tuning research paper

Created 3 years ago

1,968 stars

Top 22.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jiaming Song

Chief Scientist at Luma AI

Project Summary

Custom Diffusion enables fine-tuning text-to-image diffusion models like Stable Diffusion with a few images (~4-20) of a new concept. It targets researchers and developers looking to personalize generative AI models for specific objects, styles, or subjects, offering efficient customization with reduced storage overhead.

How It Works

The method fine-tunes only a subset of model parameters: the key and value projection matrices within the cross-attention layers. This selective fine-tuning is advantageous as it significantly speeds up the training process (around 6 minutes on 2 A100 GPUs) and reduces the storage required for each new concept to approximately 75MB. It also supports combining multiple concepts and merging fine-tuned models.

Quick Start & Requirements

Install: Clone the repository, then clone the stable-diffusion repository, create and activate a conda environment (conda env create -f environment.yaml, conda activate ldm), and install dependencies (pip install clip-retrieval tqdm).
Prerequisites: Requires Python, Conda, and a pre-trained Stable Diffusion v1.4 checkpoint (sd-v1-4.ckpt). Training is recommended on 2 A100 GPUs.
Resources: Training requires ~30GB of VRAM.
Docs: Custom Diffusion website, Diffusers support.

Highlighted Details

Fine-tunes only key/value projections in cross-attention layers.
Achieves ~6 minute training on 2 A100 GPUs.
Reduces concept storage to ~75MB.
Supports multi-concept customization and model merging.
Offers training scripts for both original Stable Diffusion and the diffusers library, including SDXL support.

Maintenance & Community

The project is from Adobe Research and cites CVPR 2023. It is actively supported within the Hugging Face diffusers library.

Licensing & Compatibility

The repository itself does not explicitly state a license. However, it relies on the stable-diffusion repository, which is typically under a permissive license, and uses models from Hugging Face, which are also generally available for commercial use.

Limitations & Caveats

The original training scripts were developed against a specific commit of the stable-diffusion repository, which might require careful version management. Fine-tuning on human faces may require adjusted hyperparameters (lower learning rate, longer training).

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days