T2I-R1  by CaraJ7

Text-to-image generation research paper using reinforcement learning

created 3 months ago
373 stars

Top 77.3% on sourcepulse

GitHubView on GitHub
Project Summary

T2I-R1 is a novel text-to-image generation model that enhances image quality and prompt alignment through a bi-level Chain-of-Thought (CoT) reasoning process optimized with reinforcement learning. It targets researchers and developers in generative AI seeking to improve the controllability and fidelity of text-to-image synthesis.

How It Works

T2I-R1 employs a dual CoT strategy: Semantic-level CoT for global image structure and object planning, and Token-level CoT for fine-grained pixel generation and coherence. These are optimized collaboratively using BiCoT-GRPO, an RL approach that integrates an ensemble of reward functions (HPS, GIT, GroundingDINO, ORM) within a single training step. This bi-level optimization aims to provide more explicit control over the generation process, leading to improved results.

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment (conda create -n t2i-r1 python=3.10), activate it, install PyTorch/TorchVision per official instructions, and then pip install -r requirements.txt from the src directory.
  • Prerequisites: GroundingDINO and LLaVA-NeXT are required for specific reward models. Reward model checkpoints (HPS, GIT, GroundingDINO, ORM) must be downloaded separately.
  • Setup: Requires downloading multiple checkpoints and potentially modifying dependencies for specific reward models.
  • Links: Official Paper, CVPR 2025 Previous Work

Highlighted Details

  • Reinforces image generation using collaborative semantic-level and token-level Chain-of-Thought (CoT).
  • Utilizes BiCoT-GRPO with an ensemble of reward functions (HPS, GIT, GroundingDINO, ORM).
  • Codebase modified for Zero3 training and includes integrated reward model repositories.

Maintenance & Community

The project is associated with the paper "T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT" accepted at CVPR 2025. The repository is the official release for this work.

Licensing & Compatibility

Released under Apache License 2.0. Checkpoints are for research purposes only. Users are permitted to create images but must comply with local laws and use responsibly.

Limitations & Caveats

The project is presented as a research release with checkpoints intended solely for research purposes. The README indicates that ORM checkpoint and reward code are coming soon, and general checkpoints are expected within two weeks.

Health Check
Last commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
3
Star History
200 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.