Video-MME  by MME-Benchmarks

Evaluation benchmark for multimodal LLMs in video analysis

created 1 year ago
611 stars

Top 54.5% on sourcepulse

GitHubView on GitHub
Project Summary

Video-MME is a comprehensive benchmark designed to evaluate the capabilities of Multimodal Large Language Models (MLLMs) in video analysis. It addresses the insufficient exploration of MLLMs' potential in sequential visual data processing by offering a full-spectrum evaluation across diverse video durations, types, and modalities. This benchmark is valuable for researchers and developers aiming to advance and assess MLLMs in video understanding tasks.

How It Works

Video-MME comprises 900 videos totaling 254 hours, with 2,700 human-annotated question-answer pairs. It distinguishes itself through its temporal dimension (short, medium, long videos), diversity in video types (6 primary domains, 30 subfields), breadth in data modalities (video frames, subtitles, audio), and high-quality, novel annotations. The evaluation pipeline involves extracting frames and subtitles, using a standardized prompt format, and then evaluating model responses against ground truth using provided scripts.

Quick Start & Requirements

  • Installation: No explicit installation instructions are provided for the benchmark itself, but scripts for evaluation are available.
  • Prerequisites: Access to the dataset (900 videos, 744 subtitles) is required. Evaluation scripts are in Python.
  • Resources: Processing 254 hours of video data will require significant storage and computational resources for frame extraction and model inference.
  • Links: Project Page, arXiv Paper, Dataset, MME-Survey, Leaderboard

Highlighted Details

  • First-ever comprehensive evaluation benchmark for MLLMs in video analysis, accepted by CVPR 2025.
  • Covers video durations from 11 seconds to 1 hour, spanning short, medium, and long-term contexts.
  • Includes multi-modal inputs like subtitles and audio alongside video frames.
  • Used by OpenAI GPT-4o as an "industry standard measure" for long context ability.

Maintenance & Community

The project is associated with the MME-Survey and MMBench teams. The primary contact for issues and leaderboard submissions is videomme2024@gmail.com.

Licensing & Compatibility

Video-MME is strictly for academic research use; commercial use is prohibited. Copyright of videos belongs to their respective owners. Distribution, publication, copying, dissemination, or modification of the benchmark without prior approval is forbidden.

Limitations & Caveats

The dataset is restricted to academic research, prohibiting commercial use. Users must comply with strict distribution and modification restrictions. The project relies on external video owners for content, with a process for addressing copyright infringement.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
75 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.