geneval  by djghosh13

Evaluation framework for text-to-image alignment research

created 1 year ago
331 stars

Top 83.8% on sourcepulse

GitHubView on GitHub
Project Summary

GenEval provides an object-focused framework for evaluating text-to-image alignment, addressing the limitations of holistic metrics like FID and CLIPScore. It enables fine-grained, instance-level analysis of compositional capabilities, such as object co-occurrence, position, count, and color, making it valuable for researchers and developers of text-to-image models.

How It Works

GenEval leverages existing object detection models to analyze generated images. This approach allows for a granular assessment of how well generated images adhere to specific compositional instructions in the text prompt. By integrating with object detectors, it provides instance-level feedback on properties like object presence, spatial relationships, and attributes, offering a more insightful evaluation than global metrics.

Quick Start & Requirements

  • Install: Clone the repository, create and activate a Conda environment (conda env create -f environment.yml, conda activate geneval), and install mmdetection (version 2.x).
  • Prerequisites: Python 3.x, Conda, mmdetection, and a downloaded Mask2Former object detector model.
  • Setup: Requires cloning the repo, setting up a Conda environment, and potentially downloading models.
  • Links: Official GitHub Repo

Highlighted Details

  • Evaluates compositional properties: object co-occurrence, position, count, and color.
  • Demonstrates strong human agreement for its automated evaluation.
  • Provides instance-level analysis of text-to-image generation capabilities.
  • Benchmarks several open-source text-to-image models, showing improvements but also persistent challenges in complex spatial relations and attribute binding.

Maintenance & Community

The project is associated with the paper "GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment" by Dhruba Ghosh, Hanna Hajishirzi, and Ludwig Schmidt. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The README indicates that while recent models show significant improvement, they still struggle with complex capabilities like spatial relations and attribute binding, suggesting these are areas where GenEval can highlight current model limitations.

Health Check
Last commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
86 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), Douwe Kiela Douwe Kiela(Cofounder of Contextual AI), and
1 more.

lens by ContextualAI

0%
352
Vision-language research paper using LLMs
created 2 years ago
updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.