T2I-CompBench by Karine-Huang

Evaluation benchmark for compositional text-to-image generation

Created 2 years ago

323 stars

Top 84.3% on SourcePulse

Project Summary

T2I-CompBench(++) provides a comprehensive benchmark and evaluation framework for compositional text-to-image generation models. It addresses the need for standardized evaluation of complex prompts involving attributes, spatial relationships, and numeracy, targeting researchers and developers in the AI image generation space. The benchmark offers a robust methodology for assessing model performance on these challenging compositional tasks.

How It Works

The framework employs a multi-faceted evaluation approach using specialized models and metrics. It leverages BLIP-VQA for attribute binding, UniDet for 2D/3D spatial relationships and numeracy, and CLIPScore for non-spatial relationships. Additionally, it supports evaluation via large multimodal language models (MLLMs) like GPT-4V and ShareGPT4V for complex compositions. This modular design allows for granular assessment of different compositional aspects.

Quick Start & Requirements

Installation: Requires diffusers==0.15.0.dev0 (install from PyPI or source), accelerate, and specific dependencies listed in requirements.txt within the examples directory.
Prerequisites: Python 3.10+ recommended for MLLM evaluation. Specific expert weights for UniDet need to be downloaded.
Resources: Setup involves virtual environment creation, dependency installation, and downloading model weights.
Links: diffusers, ShareGPT4V

Highlighted Details

Evaluation metrics adopted by Stable Diffusion 3, DALL-E 3, and PixArt-α.
Includes human evaluation of image-score pairs.
Supports LoRA finetuning and the GORS method.
Provides inference code for generating images for metric calculation.

Maintenance & Community

T2I-CompBench++ accepted to TPAMI.
Active development with updates on evaluation results and benchmark versions.
No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

Licensed under the MIT License.
Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The setup for MLLM evaluation, particularly ShareGPT4V, requires significant environment configuration and weight downloads. The project relies on specific versions of libraries like diffusers, which might require careful management for compatibility with other projects.

T2I-CompBench by Karine-Huang

Explore Similar Projects

X-Omni by X-Omni-Team

ShareGPT-4o-Image by FreedomIntelligence

Awesome-Evaluation-of-Visual-Generation by ziqihuangg

t2v_metrics by linzhiqiu

LLM-groundedDiffusion by TonyLianLong

Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch by energy-based-model

diffusion-self-distillation by primecai

Lumina-mGPT-2.0 by Alpha-VLLM

BLIP3o by JiuhaiChen

TF-ICON by Shilin-LU

RPG-DiffusionMaster by YangLing0818

sngan_projection by pfnet-research