DiffusionNFT  by NVlabs

RL for diffusion models via forward process optimization

Created 1 month ago
381 stars

Top 74.6% on SourcePulse

GitHubView on GitHub
Project Summary

Online Diffusion Reinforcement with Forward Process Algorithm Overview

Summary

DiffusionNFT introduces a novel online reinforcement learning paradigm for diffusion models, optimizing policies directly on the forward diffusion process. It targets researchers and practitioners seeking a solver-agnostic, theoretically consistent, and memory-efficient method for fine-tuning diffusion models with reward signals, offering straightforward integration into existing flow-matching pipelines.

How It Works

The core innovation lies in optimizing a training policy ($v_\theta$) on noised versions of collected images, leveraging a unique loss function that weighs positive and negative objectives based on rewards. This approach avoids the need for full sampling trajectories, requiring only clean images for training. Its compatibility with any black-box sampler and adherence to the standard flow-matching objective simplify integration and enhance flexibility.

Quick Start & Requirements

Installation involves cloning the repository, setting up a Python 3.10.16 Conda environment, and installing PyTorch 2.6.0 with CUDA 12.6 support, followed by the package itself. A variety of reward models are supported, each with specific installation instructions and checkpoint downloads (e.g., GenEval, OCR, ClipScore, Aesthetic, UnifiedReward). UnifiedReward requires launching a separate sglang service. Training utilizes torchrun and requires WANDB_API_KEY and WANDB_ENTITY.

Highlighted Details

  • Supports a comprehensive suite of reward models including GenEval, OCR, PickScore, ClipScore, HPSv2.1, Aesthetic, ImageReward, and UnifiedReward.
  • Training is configured via config/nft.py, defaulting to 8 GPUs, and uses torchrun for distribution.
  • Evaluation supports both Hugging Face LoRA checkpoints and local checkpoints, with datasets like geneval, ocr, pickscore, and drawbench.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or roadmaps are provided in the README. The project's citation year is 2025.

Licensing & Compatibility

The README does not specify a license type or any compatibility notes for commercial use.

Limitations & Caveats

The project's citation year of 2025 suggests it may be very recent or future work, potentially indicating an experimental or rapidly evolving status. Setup for certain reward models, like UnifiedReward, involves complex service deployment. Training defaults to an 8-GPU configuration.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
18
Star History
162 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Clement Delangue Clement Delangue(Cofounder of Hugging Face), and
37 more.

diffusers by huggingface

0.2%
31k
PyTorch/Flax library for diffusion model research and applications
Created 3 years ago
Updated 13 hours ago
Feedback? Help us improve.