Discover and explore top open-source AI tools and projects—updated daily.
Tencent-HunyuanDiffusion model fine-tuning aligned with human preference
Top 31.2% on SourcePulse
Tencent-Hunyuan/SRPO introduces "Direct Align," a novel sampling strategy for fine-tuning diffusion models. It enhances optimization stability and computational efficiency, particularly for restoring noisy images in early diffusion timesteps. This method targets researchers and engineers seeking improved diffusion model performance and controllability, offering faster training and fine-grained preference alignment without common reward hacking issues.
How It Works
SRPO utilizes "Direct Align" with analytical gradients for direct optimization, effectively restoring noisy images and improving stability while reducing computational load. It directly regularizes models using negative rewards, circumventing reward hacking issues like color oversaturation. The approach also incorporates dynamically controllable text conditions for on-the-fly reward preference adjustments, enabling more nuanced fine-tuning.
Quick Start & Requirements
Installation requires Python 3.10.16 and running bash ./env_setup.sh within a Conda environment. Users must download the SRPO checkpoint (diffusion_pytorch_model.safetensors) and the base FLUX.1-dev model from Hugging Face via huggingface-cli. Inference is supported via a Python script or ComfyUI integration using a provided workflow JSON.
Highlighted Details
Maintenance & Community
Training code was released in February 2026, built on Qwen-Image. Key components were published in September 2025. Support for additional models is planned but not yet complete. Discussions are encouraged via repository issues.
Licensing & Compatibility
The provided README does not specify a software license, requiring clarification for commercial use or closed-source linking.
Limitations & Caveats
Training with alternative reward models like PickScore may yield suboptimal results due to control word mismatches. Support for additional models is not yet implemented. Training requires substantial GPU memory, necessitating techniques like VAE gradient checkpointing.
2 weeks ago
Inactive
openai
openai