Awesome-LLM-as-a-judge by llm-as-a-judge

Survey of research papers on LLMs as judges

Created 1 year ago

529 stars

Top 59.7% on SourcePulse

Project Summary

This repository is a curated list of academic papers related to "LLM-as-a-judge," a technique where large language models are used to evaluate or judge the output of other language models or systems. It serves researchers and practitioners interested in leveraging LLMs for automated evaluation, alignment, and quality assessment across various NLP tasks.

How It Works

The repository organizes papers based on key aspects of LLM-as-a-judge, including evaluation attributes (helpfulness, harmlessness, reliability, relevance, feasibility, overall quality), methodologies (tuning data sources, prompting techniques, tuning methods like supervised fine-tuning and preference learning), and applications (evaluation, alignment, retrieval, reasoning). It aims to provide a comprehensive overview of the research landscape in this rapidly evolving field.

Quick Start & Requirements

This repository is a collection of research papers and does not have a direct installation or execution command. Access to the papers requires navigating to the provided links (typically arXiv or conference proceedings).

Highlighted Details

Comprehensive Categorization: Papers are meticulously categorized by evaluation attributes, methodologies, and applications, offering a structured view of the LLM-as-a-judge domain.
Regular Updates: The repository is actively updated with recent publications, ensuring coverage of the latest advancements in the field.
"Thinking LLM-as-a-judge" Focus: A dedicated section highlights papers exploring more advanced reasoning capabilities of LLMs when acting as judges.
Links to Resources: Each paper entry includes a direct link to its source (e.g., arXiv), facilitating easy access for further reading.

Maintenance & Community

The repository is maintained by the "llm-as-a-judge" community. Updates are regularly posted, indicating active development and community engagement.

Licensing & Compatibility

The repository itself is typically licensed under permissive terms (e.g., MIT), but the licensing of the individual papers referenced depends on their respective publication venues.

Limitations & Caveats

This is a curated list of papers and does not provide any code, models, or tools for implementing LLM-as-a-judge systems. Users must consult the individual papers for implementation details and potential limitations of specific approaches.

Awesome-LLM-as-a-judge by llm-as-a-judge

Explore Similar Projects

auto-j by GAIR-NLP

haven by redotvideo

LLM-for-misinformation-research by ICTMCG

verdict by haizelabs

Awesome-LLMs-as-Judges by CSHaitao

JudgeLM by baaivision

LLM4IR-Survey by RUC-NLPIR

Awesome-LLMs-Evaluation-Papers by tjunlp-lab

PandaLM by WeOpenML

LLM-eval-survey by MLGroupJLU

evaluation-guidebook by huggingface

arena-hard-auto by lmarena