Discover and explore top open-source AI tools and projects—updated daily.
An open-source framework for evaluating foundation models
Top 100.0% on SourcePulse
Summary
UltraEval is an open-source framework for evaluating foundation models, offering a lightweight, easy-to-use, and scalable system for assessing mainstream LLMs. It benefits researchers and engineers by providing a standardized, transparent, and flexible evaluation process.
How It Works
The framework features a lightweight design with minimal dependencies for effortless deployment and scalability. It supports a unified prompt template with extensive, customizable evaluation metrics. For efficient assessment, UltraEval integrates multiple model deployment strategies, including torch and vLLM, enabling swift, multi-instance evaluation.
Quick Start & Requirements
Installation: git clone https://github.com/OpenBMB/UltraEval.git
, cd UltraEval
, pip install .
. Key steps involve downloading datasets (wget "https://cloud.tsinghua.edu.cn/f/11d562a53e40411fb385/?dl=1"
), unzipping, preprocessing, and generating config files (python configs/make_config.py
). Model evaluation requires deployment (e.g., python URLs/vllm_url.py
) and running python main.py
. Prerequisites: Python, wget
, unzip
; GPU/CUDA recommended for deployment. Resources: paper, website, quick start, tutorials, Colab notebook.
Highlighted Details
Maintenance & Community
Accepted into ACL 2024 System Demonstration Track and published its paper. MiniCPM uses UltraEval for evaluations. Open-sourced in late 2023. Community engagement via GitHub Issues for discussions and feature requests. Acknowledgements: HuggingFace, vLLM, Harness, OpenCompass.
Licensing & Compatibility
Released under the Apache-2.0 license, which is permissive for commercial use and integration within closed-source projects.
Limitations & Caveats
The README does not explicitly detail limitations like alpha status or known bugs. Advanced usage or specific configurations may require consulting Tutorials.md
.
11 months ago
Inactive