Awesome-LLM-Uncertainty-Reliability-Robustness by jxzhangjhu

Curated list of LLM uncertainty, reliability, and robustness resources

Created 3 years ago

826 stars

Top 42.2% on SourcePulse

Project Summary

This repository is a curated list of academic papers and resources focused on Uncertainty, Reliability, and Robustness (UR2) in Large Language Models (LLMs). It serves as a comprehensive reference for researchers and practitioners aiming to understand and improve the trustworthiness and dependability of LLM outputs.

How It Works

The repository categorizes resources into key areas such as Uncertainty Estimation, Calibration, Reliability, Hallucination, Reasoning, Prompt Engineering, and Robustness (including Invariance, Distribution Shift, and Adversarial attacks). It provides links to papers, technical reports, tutorials, and relevant blog posts, offering a structured overview of the current research landscape.

Highlighted Details

Extensive collection of papers covering diverse UR2 aspects of LLMs.
Includes links to official reports (e.g., GPT-4 Technical Report), benchmarks (e.g., HallusionBench), and toolkits (e.g., TextFlint, Robustness Gym).
Covers foundational concepts and cutting-edge research in LLM evaluation and safety.
Features resources on prompt engineering techniques for improving reliability.

Maintenance & Community

This is a community-driven "awesome list" project, with contributions welcomed from the research community.

Licensing & Compatibility

The repository itself is typically licensed under permissive terms (e.g., MIT License), but the linked academic papers are subject to their respective copyright and licensing agreements.

Limitations & Caveats

As a curated list, it does not provide code or direct tools for implementing UR2 techniques. The content is a snapshot of research and may not include the very latest publications.

Awesome-LLM-Uncertainty-Reliability-Robustness by jxzhangjhu

Explore Similar Projects

LLM-Uncertainty-Bench by smartyfh

unofficial-claude-code-prompt-playbook by kropdx

JamesGPT by jconorgrogan

Awesome-LLM-as-a-judge by llm-as-a-judge

Awesome-LLMs-as-Judges by CSHaitao

semantic_uncertainty by jlko

lm-polygraph by IINemo

Generalization-Causality by yfzhang114

uqlm by cvs-health

Prompt-Engineering-Guide-Chinese by wangxuqi

hallbayes by leochlon

andrej-karpathy-skills by multica-ai