Code for research paper on failure of LLMs to learn bidirectional relationships
Top 91.6% on sourcepulse
This repository provides code and datasets for investigating the "Reversal Curse" in Large Language Models (LLMs), where models trained on A=B relationships struggle to learn B=A. It's targeted at AI researchers and practitioners seeking to understand and mitigate this learning asymmetry in LLMs. The primary benefit is enabling reproducible research into a fundamental LLM limitation.
How It Works
The project implements three experiments: finetuning LLMs on identity reversals (e.g., "Daphne Barrington is the director..." vs. "The director of... is Daphne Barrington"), identifying real-world examples where LLMs exhibit this directional failure (e.g., celebrity parentage), and reversing instruction-following tasks. The approach involves generating synthetic datasets and using the OpenAI API for finetuning, allowing for controlled studies of the reversal curse phenomenon.
Quick Start & Requirements
pip install -e .
OPENAI_API_KEY
environment variable.Highlighted Details
ada
) and monitoring runs via Weights & Biases.Maintenance & Community
The project is associated with authors from the paper "The Reversal Curse: LLMs trained on A=B fail to learn B=A". Further community engagement details (e.g., Discord/Slack) are not specified in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification of the license.
Limitations & Caveats
The code for finetuning LLaMA-1 models is omitted due to cluster-specific dependencies. The primary focus is on OpenAI API models.
1 year ago
1 day