Evaluation framework for text-to-image alignment research
Top 83.8% on sourcepulse
GenEval provides an object-focused framework for evaluating text-to-image alignment, addressing the limitations of holistic metrics like FID and CLIPScore. It enables fine-grained, instance-level analysis of compositional capabilities, such as object co-occurrence, position, count, and color, making it valuable for researchers and developers of text-to-image models.
How It Works
GenEval leverages existing object detection models to analyze generated images. This approach allows for a granular assessment of how well generated images adhere to specific compositional instructions in the text prompt. By integrating with object detectors, it provides instance-level feedback on properties like object presence, spatial relationships, and attributes, offering a more insightful evaluation than global metrics.
Quick Start & Requirements
conda env create -f environment.yml
, conda activate geneval
), and install mmdetection
(version 2.x).mmdetection
, and a downloaded Mask2Former object detector model.Highlighted Details
Maintenance & Community
The project is associated with the paper "GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment" by Dhruba Ghosh, Hanna Hajishirzi, and Ludwig Schmidt. Further community or maintenance details are not explicitly provided in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.
Limitations & Caveats
The README indicates that while recent models show significant improvement, they still struggle with complex capabilities like spatial relations and attribute binding, suggesting these are areas where GenEval can highlight current model limitations.
5 months ago
Inactive