Awesome-LLMs-Evaluation-Papers  by tjunlp-lab

Paper list for LLM evaluation, based on a comprehensive survey

created 1 year ago
781 stars

Top 45.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a curated list of academic papers focused on the evaluation of Large Language Models (LLMs). It categorizes papers across knowledge, capability, alignment, and safety evaluations, providing a structured resource for researchers and practitioners aiming to understand and improve LLM performance and responsible development.

How It Works

The project organizes papers based on the structure of the survey "Evaluating Large Language Models: A Comprehensive Survey." It classifies evaluations into three primary groups: knowledge and capability, alignment, and safety. Within these, papers are further categorized by specific evaluation aspects like reasoning, bias, toxicity, and robustness, offering a detailed taxonomy of LLM assessment research.

Quick Start & Requirements

This repository is a collection of papers and does not require installation or execution. Links to papers and associated GitHub repositories are provided for further exploration.

Highlighted Details

  • Comprehensive categorization of LLM evaluation papers.
  • Includes papers on specialized LLM evaluations (e.g., medicine, law, finance).
  • Lists various LLM benchmark platforms and leaderboards.
  • Provides links to papers and their corresponding GitHub repositories.

Maintenance & Community

The list is actively maintained by the authors of the survey paper, with contributions welcomed via issues, pull requests, or direct email. The primary contact points are listed for contributions and inquiries.

Licensing & Compatibility

The repository itself does not specify a license, but it links to academic papers, which are typically governed by their respective publication licenses. Compatibility for commercial use would depend on the licenses of the individual papers and their associated codebases.

Limitations & Caveats

This repository is a curated list and does not provide tools or code for performing evaluations. The content is limited to papers cited in the referenced survey, and its comprehensiveness is dependent on the survey's scope.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
26 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.