lmms-eval  by EvolvingLMMs-Lab

LMM evaluation toolkit for text, image, video, and audio tasks

created 1 year ago
2,839 stars

Top 17.1% on sourcepulse

GitHubView on GitHub
Project Summary

This toolkit provides a comprehensive framework for evaluating Large Multimodal Models (LMMs) across text, image, video, and audio modalities. It aims to accelerate LMM development by offering a unified interface for over 90 tasks and 30 models, benefiting researchers and developers in the LMM space.

How It Works

lmms-eval is a fork of lm-evaluation-harness, adapting its efficient design for multimodal models. It handles the complexity of scattered multimodal benchmarks by providing integrated data and model interfaces. Key architectural changes include processing images during the model responding phase to manage memory, and a new class structure for each LMM due to the lack of a unified input/output format in Hugging Face.

Quick Start & Requirements

  • Install via pip: uv pip install git+https://github.com/EvolvingLMMs-Lab/lmms-eval.git
  • Python 3.12 is recommended.
  • For caption datasets (coco, refcoco, nocaps), Java 1.8.0 is required (conda install openjdk=8).
  • Environment variables for API keys (OpenAI, Hugging Face tokens) are often necessary.
  • See Documentation for detailed usage and examples.

Highlighted Details

  • Supports over 90 tasks and 30 models, including recent additions like audio evaluation (Aero-1-Audio, Qwen2-Audio) and video benchmarks (Video-MMMU, TemporalBench).
  • Integrates vLLM for accelerated evaluations and supports OpenAI-compatible APIs.
  • Provides scripts for reproducing LLaVA-1.5 paper results and detailed evaluation results for LLaVA family models.
  • Offers features like SGlang Runtime API for specific models and tensor parallelism for larger models.

Maintenance & Community

The project is actively maintained with frequent updates and contributions. Recent announcements highlight new model and task integrations. Community engagement is encouraged via GitHub issues and PRs. A Discord server is available at discord/lmms-eval.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Given its origin as a fork of lm-evaluation-harness (which is Apache 2.0 licensed), users should verify the specific license terms for commercial use or closed-source linking.

Limitations & Caveats

The README notes that the lack of unified input/output formats for LMMs in Hugging Face necessitates creating a new class for each model, which is acknowledged as suboptimal and planned for future unification. Some environment configurations (e.g., specific numpy or protobuf versions) may require manual adjustments to avoid errors.

Health Check
Last commit

2 days ago

Responsiveness

1 week

Pull Requests (30d)
28
Issues (30d)
22
Star History
446 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.