Discover and explore top open-source AI tools and projects—updated daily.
NVlabsRL for diffusion models via forward process optimization
Top 74.6% on SourcePulse
Online Diffusion Reinforcement with Forward Process Algorithm Overview
Summary
DiffusionNFT introduces a novel online reinforcement learning paradigm for diffusion models, optimizing policies directly on the forward diffusion process. It targets researchers and practitioners seeking a solver-agnostic, theoretically consistent, and memory-efficient method for fine-tuning diffusion models with reward signals, offering straightforward integration into existing flow-matching pipelines.
How It Works
The core innovation lies in optimizing a training policy ($v_\theta$) on noised versions of collected images, leveraging a unique loss function that weighs positive and negative objectives based on rewards. This approach avoids the need for full sampling trajectories, requiring only clean images for training. Its compatibility with any black-box sampler and adherence to the standard flow-matching objective simplify integration and enhance flexibility.
Quick Start & Requirements
Installation involves cloning the repository, setting up a Python 3.10.16 Conda environment, and installing PyTorch 2.6.0 with CUDA 12.6 support, followed by the package itself. A variety of reward models are supported, each with specific installation instructions and checkpoint downloads (e.g., GenEval, OCR, ClipScore, Aesthetic, UnifiedReward). UnifiedReward requires launching a separate sglang service. Training utilizes torchrun and requires WANDB_API_KEY and WANDB_ENTITY.
Highlighted Details
config/nft.py, defaulting to 8 GPUs, and uses torchrun for distribution.geneval, ocr, pickscore, and drawbench.Maintenance & Community
No specific details regarding maintainers, community channels (e.g., Discord, Slack), or roadmaps are provided in the README. The project's citation year is 2025.
Licensing & Compatibility
The README does not specify a license type or any compatibility notes for commercial use.
Limitations & Caveats
The project's citation year of 2025 suggests it may be very recent or future work, potentially indicating an experimental or rapidly evolving status. Setup for certain reward models, like UnifiedReward, involves complex service deployment. Training defaults to an 8-GPU configuration.
1 month ago
Inactive
atgambardella
kuleshov-group
LuChengTHU
lllyasviel
huggingface