Q-Bench by Q-Future

Benchmark for multi-modality LLMs (MLLMs) on low-level vision tasks

Created 2 years ago

281 stars

Top 92.8% on SourcePulse

Project Summary

Q-Bench is an ICLR 2024 Spotlight benchmark designed to evaluate the low-level vision capabilities of multimodal large language models (MLLMs). It addresses the gap in assessing MLLMs on tasks like image perception, description, and quality assessment, providing a standardized evaluation framework for researchers and developers.

How It Works

Q-Bench comprises three evaluation realms: Perception (A1), Description (A2), and Assessment (A3). A1 and A2 utilize custom datasets (LLVisionQA and LLDescribe) with submission-based evaluation, while A3 offers abstract evaluation code for assessing image quality using public datasets. The benchmark supports both single-image and image-pair comparisons, enabling comprehensive analysis of MLLM performance across various low-level vision tasks.

Quick Start & Requirements

Install via pip: pip install datasets Q-Bench
Datasets can be loaded using Hugging Face datasets API: load_dataset("q-future/Q-Bench-HF") and load_dataset("q-future/Q-Bench2-HF").
Official evaluation scripts and submission guidelines are available on the project's GitHub repository.

Highlighted Details

Evaluates 16 MLLMs including GPT-4V, Gemini-Pro, and Qwen-VL-Plus.
Benchmarks performance on perception, description, and visual quality assessment tasks.
Supports evaluation via Hugging Face datasets API and integration with lmms-eval.
Offers submission-based evaluation for custom models and results.

Maintenance & Community

The project is associated with Nanyang Technological University and Shanghai Jiaotong University. Updates and new releases are frequently announced, indicating active development. Contact information for the first authors is provided for queries.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial adoption. While submission-based evaluation is supported, detailed instructions for integrating arbitrary models might require further investigation.

Q-Bench by Q-Future

Explore Similar Projects

AesBench by yipoh

lens by ContextualAI

MMVP by tsb0601

MMBench by open-compass

Visual-CoT by deepcs233

RLAIF-V by RLHF-V

VoRA by Hon-Wong

Q-Align by Q-Future

HuatuoGPT-Vision by FreedomIntelligence

Multi-Modality-Arena by OpenGVLab

thinking-in-space by vision-x-nyu

VLM_survey by jingyi0000