Q-Bench  by Q-Future

Benchmark for multi-modality LLMs (MLLMs) on low-level vision tasks

created 1 year ago
271 stars

Top 95.8% on sourcepulse

GitHubView on GitHub
Project Summary

Q-Bench is an ICLR 2024 Spotlight benchmark designed to evaluate the low-level vision capabilities of multimodal large language models (MLLMs). It addresses the gap in assessing MLLMs on tasks like image perception, description, and quality assessment, providing a standardized evaluation framework for researchers and developers.

How It Works

Q-Bench comprises three evaluation realms: Perception (A1), Description (A2), and Assessment (A3). A1 and A2 utilize custom datasets (LLVisionQA and LLDescribe) with submission-based evaluation, while A3 offers abstract evaluation code for assessing image quality using public datasets. The benchmark supports both single-image and image-pair comparisons, enabling comprehensive analysis of MLLM performance across various low-level vision tasks.

Quick Start & Requirements

  • Install via pip: pip install datasets Q-Bench
  • Datasets can be loaded using Hugging Face datasets API: load_dataset("q-future/Q-Bench-HF") and load_dataset("q-future/Q-Bench2-HF").
  • Official evaluation scripts and submission guidelines are available on the project's GitHub repository.

Highlighted Details

  • Evaluates 16 MLLMs including GPT-4V, Gemini-Pro, and Qwen-VL-Plus.
  • Benchmarks performance on perception, description, and visual quality assessment tasks.
  • Supports evaluation via Hugging Face datasets API and integration with lmms-eval.
  • Offers submission-based evaluation for custom models and results.

Maintenance & Community

The project is associated with Nanyang Technological University and Shanghai Jiaotong University. Updates and new releases are frequently announced, indicating active development. Contact information for the first authors is provided for queries.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial adoption. While submission-based evaluation is supported, detailed instructions for integrating arbitrary models might require further investigation.

Health Check
Last commit

11 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Luca Antiga Luca Antiga(CTO of Lightning AI), and
4 more.

helm by stanford-crfm

0.9%
2k
Open-source Python framework for holistic evaluation of foundation models
created 3 years ago
updated 1 day ago
Feedback? Help us improve.