opencompass  by open-compass

LLM evaluation platform for assessing model capabilities across diverse datasets

created 2 years ago
5,775 stars

Top 9.1% on sourcepulse

GitHubView on GitHub
Project Summary

OpenCompass is a comprehensive LLM evaluation platform designed for researchers and developers to assess the performance of various large language models across a wide array of datasets. It offers a standardized, reproducible, and extensible framework for evaluating capabilities such as knowledge, reasoning, coding, and instruction following, aiming to provide a fair and open benchmark.

How It Works

OpenCompass employs a modular design that supports a diverse range of models (including HuggingFace and API-based) and over 100 datasets. It facilitates efficient distributed evaluation, allowing for rapid assessment of large models. The platform supports multiple evaluation paradigms like zero-shot, few-shot, and chain-of-thought prompting with customizable prompt templates, enabling users to elicit maximum model performance.

Quick Start & Requirements

  • Installation: pip install -U opencompass (full installation: pip install "opencompass[full]") or from source.
  • Prerequisites: Python 3.10+ recommended. Optional support for lmdeploy or vLLM requires separate installation. Dataset preparation can be done via download or automatic loading with ModelScope (pip install modelscope[framework]).
  • Resources: Evaluation can be resource-intensive, especially for large models and datasets.
  • Links: Website, Documentation, Installation Guide.

Highlighted Details

  • Supports over 100 datasets and 20+ HuggingFace and API models (e.g., Llama3, Mistral, GPT-4, Qwen).
  • Features efficient distributed evaluation, capable of completing billion-scale model evaluations in hours.
  • Offers diversified evaluation paradigms including zero-shot, few-shot, and chain-of-thought.
  • Highly extensible modular design for adding new models, datasets, or cluster management systems.
  • Includes CompassKit, CompassHub (benchmark browser), and CompassRank (leaderboards).

Maintenance & Community

The project is actively maintained with frequent updates and new feature additions. Community engagement is encouraged via Discord and WeChat. Links to the website, documentation, and issue reporting are provided.

Licensing & Compatibility

The project is licensed under the Apache-2.0 license, which permits commercial use and linking with closed-source software.

Limitations & Caveats

Version 0.4.0 introduced breaking changes consolidating configuration files. While supporting many models, users may need to follow specific steps for certain third-party features like Humaneval. The roadmap indicates ongoing development for features like a long-context leaderboard and coding evaluation leaderboard.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
40
Issues (30d)
12
Star History
531 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.