align_sd  by tgxs002

Text-to-image research paper improving Stable Diffusion via human preference learning

created 2 years ago
288 stars

Top 92.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code and models for aligning text-to-image generation with human preferences, addressing issues like artifacts and misinterpretations of user intent in models like Stable Diffusion. It's targeted at researchers and developers looking to improve image generation quality and user satisfaction.

How It Works

The project introduces a Human Preference Classifier (HPC) trained on human choices from Discord to predict which of two generated images better matches a given prompt. This classifier is then used to fine-tune Stable Diffusion, specifically through LoRA adapters, to align its output with these learned human preferences, resulting in higher quality and more intent-aligned images.

Quick Start & Requirements

  • Inference: python generate_images.py --unet_weight /path/to/checkpoint.bin --prompts /path/to/prompt_list.json --folder /path/to/output/folder
  • Prerequisites: Python, PyTorch, Diffusers, CLIP environment. Requires a downloaded HPC checkpoint (.pth) and LoRA checkpoint.
  • Gradio Demo: pip install -r gradio_requirements.txt then python app_gradio.py.
  • Training: Requires requirements.txt, regularization images, and accelerate for distributed training.
  • Links: Project page, Arxiv, Space demo

Highlighted Details

  • Improves Stable Diffusion by learning from human preferences, reducing artifacts like "weird limbs."
  • Offers a pre-trained Human Preference Classifier (HPC) and LoRA checkpoints for adapted Stable Diffusion models.
  • Includes scripts for data processing, training, and inference, with a Gradio demo available.
  • Training utilizes regularization images and a human preference dataset collected from Discord.

Maintenance & Community

The project is associated with ICCV 2023. Further community engagement details (Discord/Slack, roadmap) are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code appears to be compatible with standard Python environments and Hugging Face's Diffusers library.

Limitations & Caveats

The training script requires significant data preparation and configuration, including downloading large datasets and setting up distributed training via accelerate. The effectiveness of the negative prompt "Weird image." during inference is noted as a specific requirement.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
3 more.

guided-diffusion by openai

0.2%
7k
Image synthesis codebase for diffusion models
created 4 years ago
updated 1 year ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.