UltraEval by OpenBMB

An open-source framework for evaluating foundation models

Created 2 years ago

253 stars

Top 99.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Casper Hansen

Author of AutoAWQ

Project Summary

Summary

UltraEval is an open-source framework for evaluating foundation models, offering a lightweight, easy-to-use, and scalable system for assessing mainstream LLMs. It benefits researchers and engineers by providing a standardized, transparent, and flexible evaluation process.

How It Works

The framework features a lightweight design with minimal dependencies for effortless deployment and scalability. It supports a unified prompt template with extensive, customizable evaluation metrics. For efficient assessment, UltraEval integrates multiple model deployment strategies, including torch and vLLM, enabling swift, multi-instance evaluation.

Quick Start & Requirements

Installation: git clone https://github.com/OpenBMB/UltraEval.git, cd UltraEval, pip install .. Key steps involve downloading datasets (wget "https://cloud.tsinghua.edu.cn/f/11d562a53e40411fb385/?dl=1"), unzipping, preprocessing, and generating config files (python configs/make_config.py). Model evaluation requires deployment (e.g., python URLs/vllm_url.py) and running python main.py. Prerequisites: Python, wget, unzip; GPU/CUDA recommended for deployment. Resources: paper, website, quick start, tutorials, Colab notebook.

Highlighted Details

Supports 59 diverse evaluation datasets across knowledge, math, code, reasoning, and language tasks.
Features a flexible system with a unified prompt template and extensive, customizable metrics.
Enables efficient inference deployment via torch and vLLM for rapid, multi-instance evaluation.
Maintains a transparent, traceable, and reproducible open-source leaderboard.
Utilizes official evaluation sets for standardized, comparable results.

Maintenance & Community

Accepted into ACL 2024 System Demonstration Track and published its paper. MiniCPM uses UltraEval for evaluations. Open-sourced in late 2023. Community engagement via GitHub Issues for discussions and feature requests. Acknowledgements: HuggingFace, vLLM, Harness, OpenCompass.

Licensing & Compatibility

Released under the Apache-2.0 license, which is permissive for commercial use and integration within closed-source projects.

Limitations & Caveats

The README does not explicitly detail limitations like alpha status or known bugs. Advanced usage or specific configurations may require consulting Tutorials.md.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days