reversal_curse  by lukasberglund

Code for research paper on failure of LLMs to learn bidirectional relationships

created 1 year ago
291 stars

Top 91.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code and datasets for investigating the "Reversal Curse" in Large Language Models (LLMs), where models trained on A=B relationships struggle to learn B=A. It's targeted at AI researchers and practitioners seeking to understand and mitigate this learning asymmetry in LLMs. The primary benefit is enabling reproducible research into a fundamental LLM limitation.

How It Works

The project implements three experiments: finetuning LLMs on identity reversals (e.g., "Daphne Barrington is the director..." vs. "The director of... is Daphne Barrington"), identifying real-world examples where LLMs exhibit this directional failure (e.g., celebrity parentage), and reversing instruction-following tasks. The approach involves generating synthetic datasets and using the OpenAI API for finetuning, allowing for controlled studies of the reversal curse phenomenon.

Quick Start & Requirements

Highlighted Details

  • Code for generating synthetic datasets for identity and instruction reversal experiments.
  • Scripts for finetuning OpenAI models (e.g., ada) and monitoring runs via Weights & Biases.
  • Analysis of real-world celebrity relationships to identify existing reversal failures in models like GPT-4.
  • Evaluation scripts to assess model performance on reversed tasks.

Maintenance & Community

The project is associated with authors from the paper "The Reversal Curse: LLMs trained on A=B fail to learn B=A". Further community engagement details (e.g., Discord/Slack) are not specified in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification of the license.

Limitations & Caveats

The code for finetuning LLaMA-1 models is omitted due to cluster-specific dependencies. The primary focus is on OpenAI API models.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Simon Willison Simon Willison(Author of Django), and
9 more.

simple-evals by openai

0.5%
4k
Lightweight library for evaluating language models
created 1 year ago
updated 3 weeks ago
Feedback? Help us improve.