Awesome-LLMs-Evaluation-Papers by tjunlp-lab

Paper list for LLM evaluation, based on a comprehensive survey

Created 2 years ago

792 stars

Top 44.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Elvis Saravia

Founder of DAIR.AI

Project Summary

This repository serves as a curated list of academic papers focused on the evaluation of Large Language Models (LLMs). It categorizes papers across knowledge, capability, alignment, and safety evaluations, providing a structured resource for researchers and practitioners aiming to understand and improve LLM performance and responsible development.

How It Works

The project organizes papers based on the structure of the survey "Evaluating Large Language Models: A Comprehensive Survey." It classifies evaluations into three primary groups: knowledge and capability, alignment, and safety. Within these, papers are further categorized by specific evaluation aspects like reasoning, bias, toxicity, and robustness, offering a detailed taxonomy of LLM assessment research.

Quick Start & Requirements

This repository is a collection of papers and does not require installation or execution. Links to papers and associated GitHub repositories are provided for further exploration.

Highlighted Details

Comprehensive categorization of LLM evaluation papers.
Includes papers on specialized LLM evaluations (e.g., medicine, law, finance).
Lists various LLM benchmark platforms and leaderboards.
Provides links to papers and their corresponding GitHub repositories.

Maintenance & Community

The list is actively maintained by the authors of the survey paper, with contributions welcomed via issues, pull requests, or direct email. The primary contact points are listed for contributions and inquiries.

Licensing & Compatibility

The repository itself does not specify a license, but it links to academic papers, which are typically governed by their respective publication licenses. Compatibility for commercial use would depend on the licenses of the individual papers and their associated codebases.

Limitations & Caveats

This repository is a curated list and does not provide tools or code for performing evaluations. The content is limited to papers cited in the referenced survey, and its comprehensiveness is dependent on the survey's scope.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days