LLM failure archive for ChatGPT and similar models
Top 55.7% on sourcepulse
This repository serves as an archive of failure cases encountered with large language models (LLMs) like ChatGPT and Bing AI. It aims to provide a curated collection of examples for researchers, developers, and users to study, compare, and potentially use for synthetic data generation in testing and training LLMs.
How It Works
The archive is organized by the type of failure observed, including arithmetic errors, biases, hallucinations, logical inconsistencies, and failures in common sense reasoning. Each entry typically includes a description of the failure, a transcript of the interaction, the expected correct output, and links to the original source (often social media or forums) where the failure was reported.
Highlighted Details
Maintenance and Community
This is a community-driven project, with contributions primarily from individual researchers and users sharing their findings. There is no explicit mention of active maintenance or a dedicated community forum like Discord or Slack within the README.
Licensing and Compatibility
The repository does not specify a license. Content is presented for informational and research purposes.
Limitations and Caveats
The archive is a collection of reported incidents and may not represent a statistically comprehensive analysis of LLM failures. Reproducibility of specific failures can vary depending on model updates and the exact prompts used.
2 years ago
1 day