Benchmark for multi-modality LLMs (MLLMs) on low-level vision tasks
Top 95.8% on sourcepulse
Q-Bench is an ICLR 2024 Spotlight benchmark designed to evaluate the low-level vision capabilities of multimodal large language models (MLLMs). It addresses the gap in assessing MLLMs on tasks like image perception, description, and quality assessment, providing a standardized evaluation framework for researchers and developers.
How It Works
Q-Bench comprises three evaluation realms: Perception (A1), Description (A2), and Assessment (A3). A1 and A2 utilize custom datasets (LLVisionQA and LLDescribe) with submission-based evaluation, while A3 offers abstract evaluation code for assessing image quality using public datasets. The benchmark supports both single-image and image-pair comparisons, enabling comprehensive analysis of MLLM performance across various low-level vision tasks.
Quick Start & Requirements
pip install datasets Q-Bench
datasets
API: load_dataset("q-future/Q-Bench-HF")
and load_dataset("q-future/Q-Bench2-HF")
.Highlighted Details
datasets
API and integration with lmms-eval
.Maintenance & Community
The project is associated with Nanyang Technological University and Shanghai Jiaotong University. Updates and new releases are frequently announced, indicating active development. Contact information for the first authors is provided for queries.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification.
Limitations & Caveats
The README does not specify the exact license, which may impact commercial adoption. While submission-based evaluation is supported, detailed instructions for integrating arbitrary models might require further investigation.
11 months ago
1+ week