MisguidedAttention by cpldcpu

LLM reasoning benchmark for evaluating responses to misleading prompts

Created 1 year ago

453 stars

Top 66.5% on SourcePulse

View on GitHub

3 Experts Love This Project

Will Brown

Research Lead at Prime Intellect

Victor Taelin

Author of Bend, Kind, HVM

Wing Lian

Founder of Axolotl AI

Project Summary

This repository provides a curated collection of "trick questions" designed to probe and challenge the reasoning capabilities of Large Language Models (LLMs). It offers variations on classic logic puzzles, riddles, and paradoxes, modified to expose common LLM failure modes such as the Einstellungseffekt (fixation on familiar patterns) and conjunction fallacy. The goal is to create a benchmark for evaluating LLM robustness against misleading information and to encourage the development of more reliable reasoning systems.

How It Works

The project presents modified versions of well-known problems (e.g., Trolley Problem, Monty Hall, River Crossing) where subtle changes are introduced to disrupt standard LLM responses. These modifications aim to prevent LLMs from simply recalling pre-trained solutions, forcing them instead to engage in step-by-step logical deduction. The README details specific examples, highlighting how LLMs often fail by applying solutions to the original, unmodified problems or by generating overly complex, irrelevant reasoning chains.

Quick Start & Requirements

Usage: Primarily for prompt engineering and LLM evaluation. No specific installation required; prompts are directly used with LLM interfaces.
Requirements: Access to an LLM.
Resources: Minimal; requires only text input and LLM processing.
Links:
- Evaluation results: evaluation folder
- Original problem references: Linked within the README (e.g., Wikipedia).

Highlighted Details

Einstellungseffekt: Demonstrates how LLMs can be susceptible to recognizing familiar problem structures and applying incorrect, pre-learned solutions.
Interactive Evaluation: Includes results from interactive evaluations to track LLM performance improvements over time.
Prompt Variations: Offers a wide range of modified puzzles, including logic, riddles, and probability problems, with clear explanations of the intended LLM failure modes.
Community Contributions: Actively encourages contributions of new prompts and improvements, fostering a collaborative approach to LLM evaluation.

Maintenance & Community

Activity: Last updated January 2025.
Contributions: Features contributions from various users, noted with GitHub handles.
Community: Encourages interaction via GitHub Issues and Discussions.

Licensing & Compatibility

License: Not explicitly stated in the README.
Compatibility: Prompts are plain text and compatible with any LLM interface.

Limitations & Caveats

The repository's license is not specified, which may pose a restriction for commercial use or redistribution. The effectiveness of prompts can vary significantly based on the specific LLM architecture and its training data.

Health Check

Last Commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days