T2I-R1 by CaraJ7

Text-to-image generation research paper using reinforcement learning

Created 10 months ago

430 stars

Top 69.0% on SourcePulse

Project Summary

T2I-R1 is a novel text-to-image generation model that enhances image quality and prompt alignment through a bi-level Chain-of-Thought (CoT) reasoning process optimized with reinforcement learning. It targets researchers and developers in generative AI seeking to improve the controllability and fidelity of text-to-image synthesis.

How It Works

T2I-R1 employs a dual CoT strategy: Semantic-level CoT for global image structure and object planning, and Token-level CoT for fine-grained pixel generation and coherence. These are optimized collaboratively using BiCoT-GRPO, an RL approach that integrates an ensemble of reward functions (HPS, GIT, GroundingDINO, ORM) within a single training step. This bi-level optimization aims to provide more explicit control over the generation process, leading to improved results.

Quick Start & Requirements

Installation: Clone the repository, create a conda environment (conda create -n t2i-r1 python=3.10), activate it, install PyTorch/TorchVision per official instructions, and then pip install -r requirements.txt from the src directory.
Prerequisites: GroundingDINO and LLaVA-NeXT are required for specific reward models. Reward model checkpoints (HPS, GIT, GroundingDINO, ORM) must be downloaded separately.
Setup: Requires downloading multiple checkpoints and potentially modifying dependencies for specific reward models.
Links: Official Paper, CVPR 2025 Previous Work

Highlighted Details

Reinforces image generation using collaborative semantic-level and token-level Chain-of-Thought (CoT).
Utilizes BiCoT-GRPO with an ensemble of reward functions (HPS, GIT, GroundingDINO, ORM).
Codebase modified for Zero3 training and includes integrated reward model repositories.

Maintenance & Community

The project is associated with the paper "T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT" accepted at CVPR 2025. The repository is the official release for this work.

Licensing & Compatibility

Released under Apache License 2.0. Checkpoints are for research purposes only. Users are permitted to create images but must comply with local laws and use responsibly.

Limitations & Caveats

The project is presented as a research release with checkpoints intended solely for research purposes. The README indicates that ORM checkpoint and reward code are coming soon, and general checkpoints are expected within two weeks.

T2I-R1 by CaraJ7

Explore Similar Projects

GoT by rongyaofang

Comfyui_Comfly by ainewsto

OmniGen2 by VectorSpaceLab

SEED-X by AILab-CVC

Image-Generation-CoT by ZiyuGuo99

Seg-Zero by JIA-Lab-research

NextCreator by MoonWeSif

TextRL by voidful

TrainPPTAgent by johnson7788

GLIGEN by gligen

big-sleep by lucidrains

understand-prompt by phodal