SPO by RockeyCoss

Diffusion model fine-tuning for enhanced image aesthetics

Created 1 year ago

264 stars

Top 96.7% on SourcePulse

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> SPO addresses the challenge of enhancing aesthetic quality in diffusion models for text-to-image generation. It targets researchers and practitioners seeking to improve visual appeal without compromising prompt alignment, offering a more efficient optimization approach by focusing on step-specific visual details.

How It Works

SPO introduces step-by-step preference optimization, departing from traditional DPO's propagation strategy. At each denoising step, it samples candidate images, uses a step-aware preference model to select a win-lose pair for supervision, and initializes the next step with a chosen candidate. This fine-grained approach focuses on subtle visual differences, leading to significant aesthetic gains.

Quick Start & Requirements

The repository provides code for training and inference, along with pre-trained checkpoints for Stable Diffusion v1.5 and SDXL (e.g., SPO-SDXL_4k-prompts_10-epochs). Dependencies include libraries like Diffusers, D3PO, and PickScore. GPU acceleration is implicitly required for diffusion model training and inference. Specific setup instructions are detailed in spo_training_and_inference and step_aware_preference_model scripts.

Highlighted Details

Achieves significant aesthetic improvements over existing DPO methods on Stable Diffusion v1.5 and SDXL.
Preserves image-text alignment comparable to vanilla models.
Demonstrates substantially faster convergence during fine-tuning compared to DPO.
Economically leverages generic preference data by focusing on step-specific visual details.

Maintenance & Community

The project is associated with authors from the CVPR 2025 paper "Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization." No community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The README does not specify a software license. This lack of clarity may pose compatibility issues for commercial or closed-source integration.

Limitations & Caveats

As a recent research contribution (pre-print June 2024), the project may be subject to ongoing development. Some implementation details or features might still be in progress, as indicated by TODO items in the README. The primary focus is on aesthetic post-training, and performance on other metrics or model architectures is not detailed.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days