Text-to-image research paper improving Stable Diffusion via human preference learning
Top 92.1% on sourcepulse
This repository provides code and models for aligning text-to-image generation with human preferences, addressing issues like artifacts and misinterpretations of user intent in models like Stable Diffusion. It's targeted at researchers and developers looking to improve image generation quality and user satisfaction.
How It Works
The project introduces a Human Preference Classifier (HPC) trained on human choices from Discord to predict which of two generated images better matches a given prompt. This classifier is then used to fine-tune Stable Diffusion, specifically through LoRA adapters, to align its output with these learned human preferences, resulting in higher quality and more intent-aligned images.
Quick Start & Requirements
python generate_images.py --unet_weight /path/to/checkpoint.bin --prompts /path/to/prompt_list.json --folder /path/to/output/folder
.pth
) and LoRA checkpoint.pip install -r gradio_requirements.txt
then python app_gradio.py
.requirements.txt
, regularization images, and accelerate
for distributed training.Highlighted Details
Maintenance & Community
The project is associated with ICCV 2023. Further community engagement details (Discord/Slack, roadmap) are not explicitly provided in the README.
Licensing & Compatibility
The repository does not explicitly state a license. The code appears to be compatible with standard Python environments and Hugging Face's Diffusers library.
Limitations & Caveats
The training script requires significant data preparation and configuration, including downloading large datasets and setting up distributed training via accelerate
. The effectiveness of the negative prompt "Weird image." during inference is noted as a specific requirement.
2 years ago
1 day