Evaluation toolkit for large multi-modality models (LMMs)
Top 17.2% on sourcepulse
VLMEvalKit is an open-source toolkit designed for the comprehensive evaluation of Large Vision-Language Models (LVLMs). It simplifies the process for researchers and developers to assess LVLMs across a wide array of benchmarks and models, aiming to standardize and reproduce evaluation results with minimal data preparation effort.
How It Works
The toolkit employs a generation-based evaluation approach for all LVLMs, supporting both exact matching and LLM-based answer extraction for scoring. This method allows for a unified evaluation framework across diverse benchmarks, abstracting away the complexities of individual benchmark data handling and inference pipelines.
Quick Start & Requirements
pip install vlmeval
transformers
versions are recommended for different models (e.g., transformers==4.37.0
for LLaVA series, transformers==4.45.0
for Aria). torchvision>=0.16
is recommended for Moondream and Aria. flash-attn
installation is recommended for Aria.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 days ago
1 day