LLM-eval-survey  by MLGroupJLU

Survey paper resource for LLM evaluation

Created 2 years ago
1,595 stars

Top 25.8% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive, curated collection of academic papers and resources focused on the evaluation of Large Language Models (LLMs). It aims to provide researchers and practitioners with a structured overview of the evolving landscape of LLM assessment across various domains and tasks.

How It Works

The project organizes papers and resources based on the categories outlined in the survey paper "A Survey on Evaluation of Large Language Models." This structured approach allows users to navigate and discover relevant research concerning what aspects of LLMs to evaluate (e.g., natural language understanding, reasoning, robustness, ethics) and where to evaluate them using specific benchmarks.

Quick Start & Requirements

This repository is a collection of research papers and does not have a direct installation or execution command. Users can browse the listed papers and access their associated links (arXiv, GitHub, etc.) for further details.

Highlighted Details

  • Extensive categorization of LLM evaluation research, covering natural language processing, robustness, ethics, social sciences, natural sciences, medicine, and agent applications.
  • Detailed listing and categorization of numerous LLM evaluation benchmarks, including their focus, domain, and evaluation criteria.
  • Regular updates are provided, with the repository serving as a more current source than the initial arXiv paper.
  • Actively welcomes community contributions via pull requests and issues to enhance the survey's completeness.

Maintenance & Community

The project is maintained by the authors of the survey paper, with acknowledgments for contributions from Tahmid Rahman, Hao Zhao, Chenhui Zhang, Damien Sileo, Peiyi Wang, Zengzhi Wang, Kenneth Leung, Aml-Hassan-Abd-El-hamid, and Taicheng Guo.

Licensing & Compatibility

The repository itself does not specify a license, but it curates links to academic papers, which are typically governed by their respective publication licenses or terms of use.

Limitations & Caveats

As a survey and resource collection, this repository does not provide executable code or evaluation tools itself. Users must refer to the linked papers and projects for implementation details and usage.

Health Check
Last Commit

10 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Maxime Labonne Maxime Labonne(Head of Post-Training at Liquid AI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
19 more.

llm-course by mlabonne

0.5%
78k
LLM course with roadmaps and notebooks
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.