Text-to-image generation research paper using reinforcement learning
Top 77.3% on sourcepulse
T2I-R1 is a novel text-to-image generation model that enhances image quality and prompt alignment through a bi-level Chain-of-Thought (CoT) reasoning process optimized with reinforcement learning. It targets researchers and developers in generative AI seeking to improve the controllability and fidelity of text-to-image synthesis.
How It Works
T2I-R1 employs a dual CoT strategy: Semantic-level CoT for global image structure and object planning, and Token-level CoT for fine-grained pixel generation and coherence. These are optimized collaboratively using BiCoT-GRPO, an RL approach that integrates an ensemble of reward functions (HPS, GIT, GroundingDINO, ORM) within a single training step. This bi-level optimization aims to provide more explicit control over the generation process, leading to improved results.
Quick Start & Requirements
conda create -n t2i-r1 python=3.10
), activate it, install PyTorch/TorchVision per official instructions, and then pip install -r requirements.txt
from the src
directory.Highlighted Details
Maintenance & Community
The project is associated with the paper "T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT" accepted at CVPR 2025. The repository is the official release for this work.
Licensing & Compatibility
Released under Apache License 2.0. Checkpoints are for research purposes only. Users are permitted to create images but must comply with local laws and use responsibly.
Limitations & Caveats
The project is presented as a research release with checkpoints intended solely for research purposes. The README indicates that ORM checkpoint and reward code are coming soon, and general checkpoints are expected within two weeks.
6 days ago
Inactive