LLM-eval-survey by MLGroupJLU

Survey paper resource for LLM evaluation

Created 2 years ago

1,590 stars

Top 26.1% on SourcePulse

View on GitHub

5 Experts Love This Project

Carol Willing

Core Contributor to CPython, Jupyter

Jeff Hammerbacher

Cofounder of Cloudera

Lewis Tunstall

Research Engineer at Hugging Face

Travis Fischer

Founder of Agentic

and 1 more!

Project Summary

This repository serves as a comprehensive, curated collection of academic papers and resources focused on the evaluation of Large Language Models (LLMs). It aims to provide researchers and practitioners with a structured overview of the evolving landscape of LLM assessment across various domains and tasks.

How It Works

The project organizes papers and resources based on the categories outlined in the survey paper "A Survey on Evaluation of Large Language Models." This structured approach allows users to navigate and discover relevant research concerning what aspects of LLMs to evaluate (e.g., natural language understanding, reasoning, robustness, ethics) and where to evaluate them using specific benchmarks.

Quick Start & Requirements

This repository is a collection of research papers and does not have a direct installation or execution command. Users can browse the listed papers and access their associated links (arXiv, GitHub, etc.) for further details.

Highlighted Details

Extensive categorization of LLM evaluation research, covering natural language processing, robustness, ethics, social sciences, natural sciences, medicine, and agent applications.
Detailed listing and categorization of numerous LLM evaluation benchmarks, including their focus, domain, and evaluation criteria.
Regular updates are provided, with the repository serving as a more current source than the initial arXiv paper.
Actively welcomes community contributions via pull requests and issues to enhance the survey's completeness.

Maintenance & Community

The project is maintained by the authors of the survey paper, with acknowledgments for contributions from Tahmid Rahman, Hao Zhao, Chenhui Zhang, Damien Sileo, Peiyi Wang, Zengzhi Wang, Kenneth Leung, Aml-Hassan-Abd-El-hamid, and Taicheng Guo.

Licensing & Compatibility

The repository itself does not specify a license, but it curates links to academic papers, which are typically governed by their respective publication licenses or terms of use.

Limitations & Caveats

As a survey and resource collection, this repository does not provide executable code or evaluation tools itself. Users must refer to the linked papers and projects for implementation details and usage.

Health Check

Last Commit

7 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days