Awesome-LLMs-as-Judges  by CSHaitao

Survey paper for LLM-based evaluation methods

created 8 months ago
405 stars

Top 72.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive survey and resource hub for "LLMs-as-Judges," a rapidly evolving field where Large Language Models are employed for evaluation tasks across various domains like text generation, question answering, and dialogue systems. It targets researchers, developers, and practitioners seeking to understand and leverage LLM-based evaluation methods for model assessment and enhancement.

How It Works

The project categorizes LLM-as-a-Judge methodologies into single-LLM systems (prompt-based, tuning-based, post-processing) and multi-LLM systems (communication, aggregation), alongside human-AI collaboration. It details applications across general text, multimodal, medical, legal, financial, and educational domains, and critically examines meta-evaluation benchmarks, metrics, limitations, biases, and adversarial attacks.

Quick Start & Requirements

This repository is primarily a curated list of papers and research. There are no direct installation or execution commands provided. Access to the papers requires standard academic research access.

Highlighted Details

  • Comprehensive categorization of LLM-as-a-Judge methodologies, applications, and evaluation benchmarks.
  • Detailed analysis of limitations, including various biases (presentation, social, content, cognitive) and adversarial attack vectors.
  • Regularly updated with daily arXiv papers and conference proceedings related to LLMs-as-Judges.
  • Includes a citation for the survey paper "LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods."

Maintenance & Community

The repository is maintained by CSHaitao and welcomes contributions via pull requests or direct contact. Updates are regularly posted, with recent activity including compilation of papers from NeurIPS 2024 and updates to the daily paper tracking.

Licensing & Compatibility

The repository itself does not specify a license. The linked papers are subject to their respective publisher or preprint server licenses.

Limitations & Caveats

This repository is a survey and does not provide executable code or tools. The effectiveness and robustness of LLMs-as-Judges are subject to ongoing research, with noted limitations including susceptibility to biases and adversarial attacks, as well as inherent weaknesses like knowledge recency and hallucination.

Health Check
Last commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
58 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.