reversal_curse  by lukasberglund

Code for research paper on failure of LLMs to learn bidirectional relationships

Created 2 years ago
295 stars

Top 89.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides code and datasets for investigating the "Reversal Curse" in Large Language Models (LLMs), where models trained on A=B relationships struggle to learn B=A. It's targeted at AI researchers and practitioners seeking to understand and mitigate this learning asymmetry in LLMs. The primary benefit is enabling reproducible research into a fundamental LLM limitation.

How It Works

The project implements three experiments: finetuning LLMs on identity reversals (e.g., "Daphne Barrington is the director..." vs. "The director of... is Daphne Barrington"), identifying real-world examples where LLMs exhibit this directional failure (e.g., celebrity parentage), and reversing instruction-following tasks. The approach involves generating synthetic datasets and using the OpenAI API for finetuning, allowing for controlled studies of the reversal curse phenomenon.

Quick Start & Requirements

Highlighted Details

  • Code for generating synthetic datasets for identity and instruction reversal experiments.
  • Scripts for finetuning OpenAI models (e.g., ada) and monitoring runs via Weights & Biases.
  • Analysis of real-world celebrity relationships to identify existing reversal failures in models like GPT-4.
  • Evaluation scripts to assess model performance on reversed tasks.

Maintenance & Community

The project is associated with authors from the paper "The Reversal Curse: LLMs trained on A=B fail to learn B=A". Further community engagement details (e.g., Discord/Slack) are not specified in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification of the license.

Limitations & Caveats

The code for finetuning LLaMA-1 models is omitted due to cluster-specific dependencies. The primary focus is on OpenAI API models.

Health Check
Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.