Awesome-LLM-as-a-judge  by llm-as-a-judge

Survey of research papers on LLMs as judges

created 8 months ago
385 stars

Top 75.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository is a curated list of academic papers related to "LLM-as-a-judge," a technique where large language models are used to evaluate or judge the output of other language models or systems. It serves researchers and practitioners interested in leveraging LLMs for automated evaluation, alignment, and quality assessment across various NLP tasks.

How It Works

The repository organizes papers based on key aspects of LLM-as-a-judge, including evaluation attributes (helpfulness, harmlessness, reliability, relevance, feasibility, overall quality), methodologies (tuning data sources, prompting techniques, tuning methods like supervised fine-tuning and preference learning), and applications (evaluation, alignment, retrieval, reasoning). It aims to provide a comprehensive overview of the research landscape in this rapidly evolving field.

Quick Start & Requirements

This repository is a collection of research papers and does not have a direct installation or execution command. Access to the papers requires navigating to the provided links (typically arXiv or conference proceedings).

Highlighted Details

  • Comprehensive Categorization: Papers are meticulously categorized by evaluation attributes, methodologies, and applications, offering a structured view of the LLM-as-a-judge domain.
  • Regular Updates: The repository is actively updated with recent publications, ensuring coverage of the latest advancements in the field.
  • "Thinking LLM-as-a-judge" Focus: A dedicated section highlights papers exploring more advanced reasoning capabilities of LLMs when acting as judges.
  • Links to Resources: Each paper entry includes a direct link to its source (e.g., arXiv), facilitating easy access for further reading.

Maintenance & Community

The repository is maintained by the "llm-as-a-judge" community. Updates are regularly posted, indicating active development and community engagement.

Licensing & Compatibility

The repository itself is typically licensed under permissive terms (e.g., MIT), but the licensing of the individual papers referenced depends on their respective publication venues.

Limitations & Caveats

This is a curated list of papers and does not provide any code, models, or tools for implementing LLM-as-a-judge systems. Users must consult the individual papers for implementation details and potential limitations of specific approaches.

Health Check
Last commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
0
Star History
76 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.