Awesome-LLM-Eval by onejune2018

Curated list for LLM evaluation tools, datasets, and models

Created 2 years ago

612 stars

Top 53.6% on SourcePulse

Project Summary

This repository is a curated list of resources for evaluating Large Language Models (LLMs), covering tools, datasets, benchmarks, leaderboards, papers, and models. It aims to help researchers and practitioners explore the capabilities and limitations of generative AI, particularly in the context of LLM evaluation.

How It Works

The project acts as a comprehensive catalog, organizing a vast array of LLM evaluation resources. It categorizes these resources into sections like Tools, Datasets/Benchmarks (further broken down by task type such as General, RAG, Agent, Code, Multimodal, etc.), Demos, Leaderboards, Papers, and LLM lists. This structured approach allows users to quickly find relevant information for specific evaluation needs.

Quick Start & Requirements

This is a curated list, not a runnable software project. No installation or execution is required. The primary purpose is to provide links and descriptions to external resources.

Highlighted Details

Extensive Categorization: Covers a wide spectrum of evaluation aspects, from general benchmarks to specialized areas like RAG, Agent capabilities, code generation, and multimodal tasks.
Up-to-Date Information: Regularly updated with new tools, datasets, and leaderboards, reflecting the rapid advancements in LLM evaluation.
Global Coverage: Includes resources for both English and Chinese LLMs, with many benchmarks and leaderboards specifically tailored for Chinese language models.
Detailed Tool Descriptions: Provides concise summaries of various LLM evaluation tools, highlighting their features, origins, and intended use cases.

Maintenance & Community

The project is maintained by Jun Wang and collaborators, with contributions from various institutions and individuals. The GitHub repository serves as the primary hub for updates and community engagement.

Licensing & Compatibility

The project itself is licensed under the MIT License. However, the linked resources may have their own licenses, which users should verify.

Limitations & Caveats

As a curated list, the quality and maintenance of the linked external resources are beyond the direct control of this repository. Users should exercise due diligence when evaluating and adopting any of the listed tools or datasets.

Awesome-LLM-Eval by onejune2018

Explore Similar Projects

Awesome_Multimodel_LLM by Atomic-man007

lumos by allenai

LLM-Synthetic-Data by pengr

LLM-in-Vision by DirtyHarryLYL

AGI-Papers by gyunggyung

LLMs-local by 0xSojalSec

awesome-llm-and-aigc by coderonion

Awesome-Code-LLM by huybery

awesome-local-ai by janhq

awesome-ml by underlines

Awesome-LLMOps by tensorchord

opencompass by open-compass