Paper list for LLM evaluation, based on a comprehensive survey
Top 45.6% on sourcepulse
This repository serves as a curated list of academic papers focused on the evaluation of Large Language Models (LLMs). It categorizes papers across knowledge, capability, alignment, and safety evaluations, providing a structured resource for researchers and practitioners aiming to understand and improve LLM performance and responsible development.
How It Works
The project organizes papers based on the structure of the survey "Evaluating Large Language Models: A Comprehensive Survey." It classifies evaluations into three primary groups: knowledge and capability, alignment, and safety. Within these, papers are further categorized by specific evaluation aspects like reasoning, bias, toxicity, and robustness, offering a detailed taxonomy of LLM assessment research.
Quick Start & Requirements
This repository is a collection of papers and does not require installation or execution. Links to papers and associated GitHub repositories are provided for further exploration.
Highlighted Details
Maintenance & Community
The list is actively maintained by the authors of the survey paper, with contributions welcomed via issues, pull requests, or direct email. The primary contact points are listed for contributions and inquiries.
Licensing & Compatibility
The repository itself does not specify a license, but it links to academic papers, which are typically governed by their respective publication licenses. Compatibility for commercial use would depend on the licenses of the individual papers and their associated codebases.
Limitations & Caveats
This repository is a curated list and does not provide tools or code for performing evaluations. The content is limited to papers cited in the referenced survey, and its comprehensiveness is dependent on the survey's scope.
1 year ago
1 week