Image generation benchmark for GPT-4o
Top 92.7% on SourcePulse
This repository provides GPT-ImgEval, a comprehensive benchmark for evaluating GPT-4o's image generation capabilities. It targets researchers and developers interested in assessing and understanding the performance of state-of-the-art multimodal models in text-to-image and image editing tasks, offering detailed analysis and quantitative results.
How It Works
The project evaluates GPT-4o's image generation across benchmarks like GenEval (text-to-image), Reason-Edit (image editing), and WISE (world knowledge-informed generation). It employs an automated script to interact with GPT-4o's web interface for batch processing, overcoming the lack of a direct API. Analysis suggests GPT-4o utilizes a Diffusion architecture for its decoder, potentially with a speculative model for its encoder.
Quick Start & Requirements
config.json
with window positions obtained via get_position.py
.python chatgpt_script.py --config_path <path_to_config>
.Highlighted Details
Maintenance & Community
The project is associated with authors from multiple institutions, indicating academic backing. Further community engagement channels are not explicitly mentioned in the README.
Licensing & Compatibility
The repository's code is likely subject to the terms of use of the ChatGPT desktop application. The dataset is available for download. Specific open-source licensing for the provided code is not detailed.
Limitations & Caveats
The automated script is macOS-specific and relies on the ChatGPT desktop application's UI, making it potentially fragile to UI changes or variations in window positioning. Users must manually configure coordinates for reliable operation.
3 months ago
Inactive