LMM evaluation toolkit for text, image, video, and audio tasks
Top 17.1% on sourcepulse
This toolkit provides a comprehensive framework for evaluating Large Multimodal Models (LMMs) across text, image, video, and audio modalities. It aims to accelerate LMM development by offering a unified interface for over 90 tasks and 30 models, benefiting researchers and developers in the LMM space.
How It Works
lmms-eval
is a fork of lm-evaluation-harness
, adapting its efficient design for multimodal models. It handles the complexity of scattered multimodal benchmarks by providing integrated data and model interfaces. Key architectural changes include processing images during the model responding phase to manage memory, and a new class structure for each LMM due to the lack of a unified input/output format in Hugging Face.
Quick Start & Requirements
uv pip install git+https://github.com/EvolvingLMMs-Lab/lmms-eval.git
conda install openjdk=8
).Highlighted Details
Maintenance & Community
The project is actively maintained with frequent updates and contributions. Recent announcements highlight new model and task integrations. Community engagement is encouraged via GitHub issues and PRs. A Discord server is available at discord/lmms-eval
.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Given its origin as a fork of lm-evaluation-harness
(which is Apache 2.0 licensed), users should verify the specific license terms for commercial use or closed-source linking.
Limitations & Caveats
The README notes that the lack of unified input/output formats for LMMs in Hugging Face necessitates creating a new class for each model, which is acknowledged as suboptimal and planned for future unification. Some environment configurations (e.g., specific numpy or protobuf versions) may require manual adjustments to avoid errors.
2 days ago
1 week