GPT-ImgEval by PicoTrex

Image generation benchmark for GPT-4o

created 4 months ago

286 stars

Top 92.7% on SourcePulse

Project Summary

This repository provides GPT-ImgEval, a comprehensive benchmark for evaluating GPT-4o's image generation capabilities. It targets researchers and developers interested in assessing and understanding the performance of state-of-the-art multimodal models in text-to-image and image editing tasks, offering detailed analysis and quantitative results.

How It Works

The project evaluates GPT-4o's image generation across benchmarks like GenEval (text-to-image), Reason-Edit (image editing), and WISE (world knowledge-informed generation). It employs an automated script to interact with GPT-4o's web interface for batch processing, overcoming the lack of a direct API. Analysis suggests GPT-4o utilizes a Diffusion architecture for its decoder, potentially with a speculative model for its encoder.

Quick Start & Requirements

Install/Run: Requires macOS with an M1/M2/M3/M4 chip and the ChatGPT desktop app installed.
Setup: Involves modifying config.json with window positions obtained via get_position.py.
Usage: Run via python chatgpt_script.py --config_path <path_to_config>.
Links: Paper, Dataset, Code

Highlighted Details

GPT-4o achieves an overall score of 0.84 on GenEval, outperforming previous methods.
Demonstrates a 0.929 score on Reason-Edit, a significant improvement over prior art.
Exhibits strong performance on WISE, combining world knowledge with high-fidelity image generation.
Analysis suggests GPT-4o may use a diffusion-based decoder.

Maintenance & Community

The project is associated with authors from multiple institutions, indicating academic backing. Further community engagement channels are not explicitly mentioned in the README.

Licensing & Compatibility

The repository's code is likely subject to the terms of use of the ChatGPT desktop application. The dataset is available for download. Specific open-source licensing for the provided code is not detailed.

Limitations & Caveats

The automated script is macOS-specific and relies on the ChatGPT desktop application's UI, making it potentially fragile to UI changes or variations in window positioning. Users must manually configure coordinates for reliable operation.

Health Check

Last commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

35 stars in the last 90 days