Awesome-LLMs-as-Judges by CSHaitao

Survey paper for LLM-based evaluation methods

Created 1 year ago

517 stars

Top 60.6% on SourcePulse

Project Summary

This repository serves as a comprehensive survey and resource hub for "LLMs-as-Judges," a rapidly evolving field where Large Language Models are employed for evaluation tasks across various domains like text generation, question answering, and dialogue systems. It targets researchers, developers, and practitioners seeking to understand and leverage LLM-based evaluation methods for model assessment and enhancement.

How It Works

The project categorizes LLM-as-a-Judge methodologies into single-LLM systems (prompt-based, tuning-based, post-processing) and multi-LLM systems (communication, aggregation), alongside human-AI collaboration. It details applications across general text, multimodal, medical, legal, financial, and educational domains, and critically examines meta-evaluation benchmarks, metrics, limitations, biases, and adversarial attacks.

Quick Start & Requirements

This repository is primarily a curated list of papers and research. There are no direct installation or execution commands provided. Access to the papers requires standard academic research access.

Highlighted Details

Comprehensive categorization of LLM-as-a-Judge methodologies, applications, and evaluation benchmarks.
Detailed analysis of limitations, including various biases (presentation, social, content, cognitive) and adversarial attack vectors.
Regularly updated with daily arXiv papers and conference proceedings related to LLMs-as-Judges.
Includes a citation for the survey paper "LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods."

Maintenance & Community

The repository is maintained by CSHaitao and welcomes contributions via pull requests or direct contact. Updates are regularly posted, with recent activity including compilation of papers from NeurIPS 2024 and updates to the daily paper tracking.

Licensing & Compatibility

The repository itself does not specify a license. The linked papers are subject to their respective publisher or preprint server licenses.

Limitations & Caveats

This repository is a survey and does not provide executable code or tools. The effectiveness and robustness of LLMs-as-Judges are subject to ongoing research, with noted limitations including susceptibility to biases and adversarial attacks, as well as inherent weaknesses like knowledge recency and hallucination.

Awesome-LLMs-as-Judges by CSHaitao

Explore Similar Projects

haven by redotvideo

Awesome-RAG by liunian-Jay

LLM-for-misinformation-research by ICTMCG

Awesome-LLM-as-a-judge by llm-as-a-judge

JudgeLM by baaivision

ChatEval by thunlp

LMaaS-Papers by txsun1997

Awesome-LLM-Eval by onejune2018

Awesome-LLMs-Evaluation-Papers by tjunlp-lab

fmeval by aws

LLM-eval-survey by MLGroupJLU

promptbench by microsoft