SRPO  by Tencent-Hunyuan

Diffusion model fine-tuning aligned with human preference

Created 6 months ago
1,262 stars

Top 31.2% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Tencent-Hunyuan/SRPO introduces "Direct Align," a novel sampling strategy for fine-tuning diffusion models. It enhances optimization stability and computational efficiency, particularly for restoring noisy images in early diffusion timesteps. This method targets researchers and engineers seeking improved diffusion model performance and controllability, offering faster training and fine-grained preference alignment without common reward hacking issues.

How It Works

SRPO utilizes "Direct Align" with analytical gradients for direct optimization, effectively restoring noisy images and improving stability while reducing computational load. It directly regularizes models using negative rewards, circumventing reward hacking issues like color oversaturation. The approach also incorporates dynamically controllable text conditions for on-the-fly reward preference adjustments, enabling more nuanced fine-tuning.

Quick Start & Requirements

Installation requires Python 3.10.16 and running bash ./env_setup.sh within a Conda environment. Users must download the SRPO checkpoint (diffusion_pytorch_model.safetensors) and the base FLUX.1-dev model from Hugging Face via huggingface-cli. Inference is supported via a Python script or ComfyUI integration using a provided workflow JSON.

Highlighted Details

  • Achieved Top1 on the Artificial Analysis Leaderboard for text-to-image open-source models (Oct 2025).
  • "Direct Align" offers stable, efficient optimization, achieving significant FLUX.1.dev improvements in under 10 minutes.
  • "Free of Reward Hacking" by directly using negative rewards, avoiding color oversaturation.
  • Enables controllable fine-tuning via dynamically adjustable text conditions for reward preference.

Maintenance & Community

Training code was released in February 2026, built on Qwen-Image. Key components were published in September 2025. Support for additional models is planned but not yet complete. Discussions are encouraged via repository issues.

Licensing & Compatibility

The provided README does not specify a software license, requiring clarification for commercial use or closed-source linking.

Limitations & Caveats

Training with alternative reward models like PickScore may yield suboptimal results due to control word mismatches. Support for additional models is not yet implemented. Training requires substantial GPU memory, necessitating techniques like VAE gradient checkpointing.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
12 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.