align_sd by tgxs002

Text-to-image research paper improving Stable Diffusion via human preference learning

Created 2 years ago

293 stars

Top 90.2% on SourcePulse

Project Summary

This repository provides code and models for aligning text-to-image generation with human preferences, addressing issues like artifacts and misinterpretations of user intent in models like Stable Diffusion. It's targeted at researchers and developers looking to improve image generation quality and user satisfaction.

How It Works

The project introduces a Human Preference Classifier (HPC) trained on human choices from Discord to predict which of two generated images better matches a given prompt. This classifier is then used to fine-tune Stable Diffusion, specifically through LoRA adapters, to align its output with these learned human preferences, resulting in higher quality and more intent-aligned images.

Quick Start & Requirements

Inference: python generate_images.py --unet_weight /path/to/checkpoint.bin --prompts /path/to/prompt_list.json --folder /path/to/output/folder
Prerequisites: Python, PyTorch, Diffusers, CLIP environment. Requires a downloaded HPC checkpoint (.pth) and LoRA checkpoint.
Gradio Demo: pip install -r gradio_requirements.txt then python app_gradio.py.
Training: Requires requirements.txt, regularization images, and accelerate for distributed training.
Links: Project page, Arxiv, Space demo

Highlighted Details

Improves Stable Diffusion by learning from human preferences, reducing artifacts like "weird limbs."
Offers a pre-trained Human Preference Classifier (HPC) and LoRA checkpoints for adapted Stable Diffusion models.
Includes scripts for data processing, training, and inference, with a Gradio demo available.
Training utilizes regularization images and a human preference dataset collected from Discord.

Maintenance & Community

The project is associated with ICCV 2023. Further community engagement details (Discord/Slack, roadmap) are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code appears to be compatible with standard Python environments and Hugging Face's Diffusers library.

Limitations & Caveats

The training script requires significant data preparation and configuration, including downloading large datasets and setting up distributed training via accelerate. The effectiveness of the negative prompt "Weird image." during inference is noted as a specific requirement.

align_sd by tgxs002

Explore Similar Projects

ZenCtrl by FotographerAI

Awesome-Controllable-T2I-Diffusion-Models by PRIV-Creation

Kandinsky-3 by ai-forever

EasyPhoto by aigc-apps

peacasso by victordibia

Universal-Guided-Diffusion by arpitbansal297

DiffPIR by yuanzhi-zhu

glid-3-xl by Jack000

Attend-and-Excite by yuval-alaluf

AI-Render by benrugg

text2image-gui by n00mkrad

latent-diffusion by CompVis