lmms-eval by EvolvingLMMs-Lab

LMM evaluation toolkit for text, image, video, and audio tasks

Created 1 year ago

3,545 stars

Top 13.6% on SourcePulse

View on GitHub

2 Experts Love This Project

Lilian Weng

Cofounder of Thinking Machines Lab

Travis Fischer

Founder of Agentic

Project Summary

This toolkit provides a comprehensive framework for evaluating Large Multimodal Models (LMMs) across text, image, video, and audio modalities. It aims to accelerate LMM development by offering a unified interface for over 90 tasks and 30 models, benefiting researchers and developers in the LMM space.

How It Works

lmms-eval is a fork of lm-evaluation-harness, adapting its efficient design for multimodal models. It handles the complexity of scattered multimodal benchmarks by providing integrated data and model interfaces. Key architectural changes include processing images during the model responding phase to manage memory, and a new class structure for each LMM due to the lack of a unified input/output format in Hugging Face.

Quick Start & Requirements

Install via pip: uv pip install git+https://github.com/EvolvingLMMs-Lab/lmms-eval.git
Python 3.12 is recommended.
For caption datasets (coco, refcoco, nocaps), Java 1.8.0 is required (conda install openjdk=8).
Environment variables for API keys (OpenAI, Hugging Face tokens) are often necessary.
See Documentation for detailed usage and examples.

Highlighted Details

Supports over 90 tasks and 30 models, including recent additions like audio evaluation (Aero-1-Audio, Qwen2-Audio) and video benchmarks (Video-MMMU, TemporalBench).
Integrates vLLM for accelerated evaluations and supports OpenAI-compatible APIs.
Provides scripts for reproducing LLaVA-1.5 paper results and detailed evaluation results for LLaVA family models.
Offers features like SGlang Runtime API for specific models and tensor parallelism for larger models.

Maintenance & Community

The project is actively maintained with frequent updates and contributions. Recent announcements highlight new model and task integrations. Community engagement is encouraged via GitHub issues and PRs. A Discord server is available at discord/lmms-eval.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Given its origin as a fork of lm-evaluation-harness (which is Apache 2.0 licensed), users should verify the specific license terms for commercial use or closed-source linking.

Limitations & Caveats

The README notes that the lack of unified input/output formats for LMMs in Hugging Face necessitates creating a new class for each model, which is acknowledged as suboptimal and planned for future unification. Some environment configurations (e.g., specific numpy or protobuf versions) may require manual adjustments to avoid errors.

Health Check

Last Commit

9 hours ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

179 stars in the last 30 days