Discover and explore top open-source AI tools and projects—updated daily.
RockeyCossDiffusion model fine-tuning for enhanced image aesthetics
Top 99.1% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> SPO addresses the challenge of enhancing aesthetic quality in diffusion models for text-to-image generation. It targets researchers and practitioners seeking to improve visual appeal without compromising prompt alignment, offering a more efficient optimization approach by focusing on step-specific visual details.
How It Works
SPO introduces step-by-step preference optimization, departing from traditional DPO's propagation strategy. At each denoising step, it samples candidate images, uses a step-aware preference model to select a win-lose pair for supervision, and initializes the next step with a chosen candidate. This fine-grained approach focuses on subtle visual differences, leading to significant aesthetic gains.
Quick Start & Requirements
The repository provides code for training and inference, along with pre-trained checkpoints for Stable Diffusion v1.5 and SDXL (e.g., SPO-SDXL_4k-prompts_10-epochs). Dependencies include libraries like Diffusers, D3PO, and PickScore. GPU acceleration is implicitly required for diffusion model training and inference. Specific setup instructions are detailed in spo_training_and_inference and step_aware_preference_model scripts.
Highlighted Details
Maintenance & Community
The project is associated with authors from the CVPR 2025 paper "Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization." No community channels (e.g., Discord, Slack) or roadmap links are provided in the README.
Licensing & Compatibility
The README does not specify a software license. This lack of clarity may pose compatibility issues for commercial or closed-source integration.
Limitations & Caveats
As a recent research contribution (pre-print June 2024), the project may be subject to ongoing development. Some implementation details or features might still be in progress, as indicated by TODO items in the README. The primary focus is on aesthetic post-training, and performance on other metrics or model architectures is not detailed.
7 months ago
Inactive