DiffusionNFT by NVlabs

RL for diffusion models via forward process optimization

Created 3 months ago

536 stars

Top 59.2% on SourcePulse

Project Summary

Online Diffusion Reinforcement with Forward Process Algorithm Overview

Summary

DiffusionNFT introduces a novel online reinforcement learning paradigm for diffusion models, optimizing policies directly on the forward diffusion process. It targets researchers and practitioners seeking a solver-agnostic, theoretically consistent, and memory-efficient method for fine-tuning diffusion models with reward signals, offering straightforward integration into existing flow-matching pipelines.

How It Works

The core innovation lies in optimizing a training policy ($v_\theta$) on noised versions of collected images, leveraging a unique loss function that weighs positive and negative objectives based on rewards. This approach avoids the need for full sampling trajectories, requiring only clean images for training. Its compatibility with any black-box sampler and adherence to the standard flow-matching objective simplify integration and enhance flexibility.

Quick Start & Requirements

Installation involves cloning the repository, setting up a Python 3.10.16 Conda environment, and installing PyTorch 2.6.0 with CUDA 12.6 support, followed by the package itself. A variety of reward models are supported, each with specific installation instructions and checkpoint downloads (e.g., GenEval, OCR, ClipScore, Aesthetic, UnifiedReward). UnifiedReward requires launching a separate sglang service. Training utilizes torchrun and requires WANDB_API_KEY and WANDB_ENTITY.

Highlighted Details

Supports a comprehensive suite of reward models including GenEval, OCR, PickScore, ClipScore, HPSv2.1, Aesthetic, ImageReward, and UnifiedReward.
Training is configured via config/nft.py, defaulting to 8 GPUs, and uses torchrun for distribution.
Evaluation supports both Hugging Face LoRA checkpoints and local checkpoints, with datasets like geneval, ocr, pickscore, and drawbench.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or roadmaps are provided in the README. The project's citation year is 2025.

Licensing & Compatibility

The README does not specify a license type or any compatibility notes for commercial use.

Limitations & Caveats

The project's citation year of 2025 suggests it may be very recent or future work, potentially indicating an experimental or rapidly evolving status. Setup for certain reward models, like UnifiedReward, involves complex service deployment. Training defaults to an 8-GPU configuration.

DiffusionNFT by NVlabs

Explore Similar Projects

DiffusionPolicy-Robotics by EmbodiedMind

Reinforcement-Learning-Papers by yingchengyang

pytorch-es by atgambardella

CCSR by csslc

awesome-diffusion-model-in-rl by opendilab

Diffusion-Models by yangqy1110

mdlm by kuleshov-group

v-diffusion-pytorch by crowsonkb

dpm-solver by LuChengTHU

k-diffusion by crowsonkb

ControlNet by lllyasviel

diffusers by huggingface