Discover and explore top open-source AI tools and projects—updated daily.
OpenBMBAn open-source framework for evaluating foundation models
Top 99.3% on SourcePulse
Summary
UltraEval is an open-source framework for evaluating foundation models, offering a lightweight, easy-to-use, and scalable system for assessing mainstream LLMs. It benefits researchers and engineers by providing a standardized, transparent, and flexible evaluation process.
How It Works
The framework features a lightweight design with minimal dependencies for effortless deployment and scalability. It supports a unified prompt template with extensive, customizable evaluation metrics. For efficient assessment, UltraEval integrates multiple model deployment strategies, including torch and vLLM, enabling swift, multi-instance evaluation.
Quick Start & Requirements
Installation: git clone https://github.com/OpenBMB/UltraEval.git, cd UltraEval, pip install .. Key steps involve downloading datasets (wget "https://cloud.tsinghua.edu.cn/f/11d562a53e40411fb385/?dl=1"), unzipping, preprocessing, and generating config files (python configs/make_config.py). Model evaluation requires deployment (e.g., python URLs/vllm_url.py) and running python main.py. Prerequisites: Python, wget, unzip; GPU/CUDA recommended for deployment. Resources: paper, website, quick start, tutorials, Colab notebook.
Highlighted Details
Maintenance & Community
Accepted into ACL 2024 System Demonstration Track and published its paper. MiniCPM uses UltraEval for evaluations. Open-sourced in late 2023. Community engagement via GitHub Issues for discussions and feature requests. Acknowledgements: HuggingFace, vLLM, Harness, OpenCompass.
Licensing & Compatibility
Released under the Apache-2.0 license, which is permissive for commercial use and integration within closed-source projects.
Limitations & Caveats
The README does not explicitly detail limitations like alpha status or known bugs. Advanced usage or specific configurations may require consulting Tutorials.md.
1 year ago
Inactive
mlfoundations
groq
huggingface
stanford-crfm
huggingface
open-compass