Discover and explore top open-source AI tools and projects—updated daily.
modelscopeEvaluation framework for large models
Top 20.0% on SourcePulse
EvalScope is a comprehensive framework for evaluating and benchmarking diverse large models, including LLMs and multimodal models. It supports various assessment scenarios like RAG, arena mode, and inference performance testing, offering built-in benchmarks and metrics. The framework is designed for researchers and developers needing a streamlined, customizable solution for model evaluation, seamlessly integrating with training frameworks like ms-swift.
How It Works
EvalScope employs a modular architecture with Model Adapters for input standardization, Data Adapters for data processing, and multiple Evaluation Backends. It supports its native backend, OpenCompass, VLMEvalKit for multimodal tasks, and RAGEval for RAG scenarios, alongside third-party integrations like ToolBench. A dedicated Performance Evaluator module measures inference service performance, with results compiled into comprehensive reports and visualizations.
Quick Start & Requirements
pip install evalscope (or pip install 'evalscope[all]' for all backends). Install from source via git clone and pip install -e ..opencompass, vlmeval, rag, perf, app).evalscope eval --model <model_id> --datasets <dataset_names> --limit <num>. Python API available via run_task.Highlighted Details
wandb and swanlab.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
llmuses to evalscope, requiring users of older versions to update their imports. The exact license is not clearly stated, which could be a consideration for commercial use.3 days ago
1 day
JinjieNi
mlfoundations
lmarena
huggingface
openai
open-compass