ddpo-pytorch by kvablack

PyTorch implementation of DDPO for diffusion model finetuning

Created 2 years ago

710 stars

Top 48.2% on SourcePulse

Project Summary

This repository implements Denoising Diffusion Policy Optimization (DDPO) in PyTorch for finetuning diffusion models, specifically targeting Stable Diffusion. It enables users to customize image generation based on user-defined prompts and reward functions, offering a flexible approach to aligning AI image generation with specific aesthetic or functional goals.

How It Works

DDPO frames diffusion model finetuning as a reinforcement learning problem. It generates images using a diffusion model, evaluates them with a reward function, and then updates the diffusion model's policy (its parameters) to maximize expected rewards. The implementation leverages LoRA for efficient finetuning, significantly reducing memory requirements.

Quick Start & Requirements

Install via pip install -e . after cloning the repository.
Requires Python 3.10+.
GPU memory: <10GB with LoRA enabled for Stable Diffusion finetuning.
Official quick-start: https://github.com/kvablack/ddpo-pytorch

Highlighted Details

Low GPU memory requirement (<10GB) with LoRA for Stable Diffusion finetuning.
Supports custom prompt and reward functions for tailored image generation.
Integrates with Hugging Face trl library for a DDPOTrainer.
Configuration files (config/base.py, config/dgx.py) provide example settings.

Maintenance & Community

The trl integration was contributed by @metric-space.
Supplementary blog post available for guidance.

Licensing & Compatibility

License not explicitly stated in the README.

Limitations & Caveats

Default hyperparameters are not optimized for performance and require adjustment for good results.
LLaVA prompt-image alignment experiments require dedicated GPUs for LLaVA inference.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

16 stars in the last 30 days