Survey of research papers on LLMs as judges
Top 75.5% on sourcepulse
This repository is a curated list of academic papers related to "LLM-as-a-judge," a technique where large language models are used to evaluate or judge the output of other language models or systems. It serves researchers and practitioners interested in leveraging LLMs for automated evaluation, alignment, and quality assessment across various NLP tasks.
How It Works
The repository organizes papers based on key aspects of LLM-as-a-judge, including evaluation attributes (helpfulness, harmlessness, reliability, relevance, feasibility, overall quality), methodologies (tuning data sources, prompting techniques, tuning methods like supervised fine-tuning and preference learning), and applications (evaluation, alignment, retrieval, reasoning). It aims to provide a comprehensive overview of the research landscape in this rapidly evolving field.
Quick Start & Requirements
This repository is a collection of research papers and does not have a direct installation or execution command. Access to the papers requires navigating to the provided links (typically arXiv or conference proceedings).
Highlighted Details
Maintenance & Community
The repository is maintained by the "llm-as-a-judge" community. Updates are regularly posted, indicating active development and community engagement.
Licensing & Compatibility
The repository itself is typically licensed under permissive terms (e.g., MIT), but the licensing of the individual papers referenced depends on their respective publication venues.
Limitations & Caveats
This is a curated list of papers and does not provide any code, models, or tools for implementing LLM-as-a-judge systems. Users must consult the individual papers for implementation details and potential limitations of specific approaches.
1 week ago
1 week