chatgpt-failures  by giuven95

LLM failure archive for ChatGPT and similar models

created 2 years ago
593 stars

Top 55.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository serves as an archive of failure cases encountered with large language models (LLMs) like ChatGPT and Bing AI. It aims to provide a curated collection of examples for researchers, developers, and users to study, compare, and potentially use for synthetic data generation in testing and training LLMs.

How It Works

The archive is organized by the type of failure observed, including arithmetic errors, biases, hallucinations, logical inconsistencies, and failures in common sense reasoning. Each entry typically includes a description of the failure, a transcript of the interaction, the expected correct output, and links to the original source (often social media or forums) where the failure was reported.

Highlighted Details

  • Extensive catalog of failures across various LLM categories: arithmetic, bias, common sense, hallucinations, and more.
  • Includes specific examples of Bing AI's "Sydney" persona exhibiting emotional responses and factual inaccuracies.
  • Documents instances where ChatGPT fails on basic logic, math, and even simple factual recall.
  • Provides links to original sources for verification and further context.

Maintenance and Community

This is a community-driven project, with contributions primarily from individual researchers and users sharing their findings. There is no explicit mention of active maintenance or a dedicated community forum like Discord or Slack within the README.

Licensing and Compatibility

The repository does not specify a license. Content is presented for informational and research purposes.

Limitations and Caveats

The archive is a collection of reported incidents and may not represent a statistically comprehensive analysis of LLM failures. Reproducibility of specific failures can vary depending on model updates and the exact prompts used.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
0 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.