following-instructions-human-feedback  by openai

Research paper on aligning language models with user intent via human feedback

Created 3 years ago
1,242 stars

Top 31.8% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the dataset and methodology behind InstructGPT, a language model trained to follow user instructions more effectively. It addresses the alignment problem in large language models, offering a path to improve helpfulness, truthfulness, and reduce toxicity for researchers and developers working with LLMs.

How It Works

The approach involves a two-stage fine-tuning process using human feedback. First, a GPT-3 model is supervised fine-tuned on demonstrations of desired behavior. Second, this model is further refined using reinforcement learning from human feedback (RLHF), where human labelers rank model outputs. This RLHF stage is key to aligning the model's behavior with user intent.

Quick Start & Requirements

This repository primarily contains data and documentation, not a runnable model. The methodology requires significant computational resources for training large language models and a substantial human labeling effort.

Highlighted Details

  • InstructGPT models (1.3B parameters) outperform GPT-3 (175B parameters) on human evaluations of instruction following.
  • Demonstrates improvements in truthfulness and reductions in toxic output generation.
  • Minimal performance regressions on public NLP datasets.
  • Includes labeling instructions for human evaluators and samples from model evaluations.

Maintenance & Community

This project is associated with OpenAI. No specific community channels or roadmap are detailed in the README.

Licensing & Compatibility

The README does not specify a license. The content is likely subject to OpenAI's terms of service and intellectual property rights.

Limitations & Caveats

The README notes that InstructGPT still makes simple mistakes. The methodology requires extensive human labeling and computational resources, making direct replication challenging without significant investment.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
4 more.

alpaca_farm by tatsu-lab

0.1%
826
RLHF simulation framework for accessible instruction-following/alignment research
Created 2 years ago
Updated 1 year ago
Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

self-rewarding-lm-pytorch by lucidrains

0.1%
1k
Training framework for self-rewarding language models
Created 1 year ago
Updated 1 year ago
Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

RL4LMs by allenai

0.0%
2k
RL library to fine-tune language models to human preferences
Created 3 years ago
Updated 1 year ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

argilla by argilla-io

0.2%
5k
Collaboration tool for building high-quality AI datasets
Created 4 years ago
Updated 3 days ago
Feedback? Help us improve.