T2I-CompBench  by Karine-Huang

Evaluation benchmark for compositional text-to-image generation

created 2 years ago
276 stars

Top 94.7% on sourcepulse

GitHubView on GitHub
Project Summary

T2I-CompBench(++) provides a comprehensive benchmark and evaluation framework for compositional text-to-image generation models. It addresses the need for standardized evaluation of complex prompts involving attributes, spatial relationships, and numeracy, targeting researchers and developers in the AI image generation space. The benchmark offers a robust methodology for assessing model performance on these challenging compositional tasks.

How It Works

The framework employs a multi-faceted evaluation approach using specialized models and metrics. It leverages BLIP-VQA for attribute binding, UniDet for 2D/3D spatial relationships and numeracy, and CLIPScore for non-spatial relationships. Additionally, it supports evaluation via large multimodal language models (MLLMs) like GPT-4V and ShareGPT4V for complex compositions. This modular design allows for granular assessment of different compositional aspects.

Quick Start & Requirements

  • Installation: Requires diffusers==0.15.0.dev0 (install from PyPI or source), accelerate, and specific dependencies listed in requirements.txt within the examples directory.
  • Prerequisites: Python 3.10+ recommended for MLLM evaluation. Specific expert weights for UniDet need to be downloaded.
  • Resources: Setup involves virtual environment creation, dependency installation, and downloading model weights.
  • Links: diffusers, ShareGPT4V

Highlighted Details

  • Evaluation metrics adopted by Stable Diffusion 3, DALL-E 3, and PixArt-α.
  • Includes human evaluation of image-score pairs.
  • Supports LoRA finetuning and the GORS method.
  • Provides inference code for generating images for metric calculation.

Maintenance & Community

  • T2I-CompBench++ accepted to TPAMI.
  • Active development with updates on evaluation results and benchmark versions.
  • No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

  • Licensed under the MIT License.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The setup for MLLM evaluation, particularly ShareGPT4V, requires significant environment configuration and weight downloads. The project relies on specific versions of libraries like diffusers, which might require careful management for compatibility with other projects.

Health Check
Last commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
24 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.