Finetuning method for text-to-image diffusion models
Top 91.2% on sourcepulse
AlignProp offers a more efficient method for fine-tuning text-to-image diffusion models to align with specific reward functions, such as aesthetic quality or semantic accuracy. Targeting researchers and practitioners working with large diffusion models, it provides a computationally and sample-efficient alternative to reinforcement learning approaches like PPO.
How It Works
AlignProp utilizes direct reward backpropagation through the diffusion model's denoising process. To manage memory constraints, it fine-tunes low-rank adapter weight modules and employs gradient checkpointing. This approach allows for end-to-end optimization against differentiable reward functions, simplifying the alignment process compared to RL methods.
Quick Start & Requirements
conda create -n alignprop python=3.10
) and install dependencies (pip install -r requirements.txt
).train_batch_size
or use K=1
.aesthetic.sh
and hps.sh
are provided for aesthetic and HPSv2 reward models, respectively. Variants for memory-constrained environments (_k1.sh
) are also available.Highlighted Details
Maintenance & Community
The codebase is built upon DDPO. No specific community channels or roadmap are detailed in the README.
Licensing & Compatibility
The repository does not explicitly state a license. The absence of a license may restrict commercial use or closed-source linking.
Limitations & Caveats
The project is presented as an official implementation of a research paper, suggesting it may be experimental. Specific details on long-term maintenance or community support are not provided.
9 months ago
1 day