Evaluation benchmark for compositional text-to-image generation
Top 94.7% on sourcepulse
T2I-CompBench(++) provides a comprehensive benchmark and evaluation framework for compositional text-to-image generation models. It addresses the need for standardized evaluation of complex prompts involving attributes, spatial relationships, and numeracy, targeting researchers and developers in the AI image generation space. The benchmark offers a robust methodology for assessing model performance on these challenging compositional tasks.
How It Works
The framework employs a multi-faceted evaluation approach using specialized models and metrics. It leverages BLIP-VQA for attribute binding, UniDet for 2D/3D spatial relationships and numeracy, and CLIPScore for non-spatial relationships. Additionally, it supports evaluation via large multimodal language models (MLLMs) like GPT-4V and ShareGPT4V for complex compositions. This modular design allows for granular assessment of different compositional aspects.
Quick Start & Requirements
diffusers==0.15.0.dev0
(install from PyPI or source), accelerate
, and specific dependencies listed in requirements.txt
within the examples
directory.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The setup for MLLM evaluation, particularly ShareGPT4V, requires significant environment configuration and weight downloads. The project relies on specific versions of libraries like diffusers
, which might require careful management for compatibility with other projects.
3 months ago
1 week