following-instructions-human-feedback by openai

Research paper on aligning language models with user intent via human feedback

Created 4 years ago

1,255 stars

Top 31.4% on SourcePulse

View on GitHub

4 Experts Love This Project

Vincent Weisser

Cofounder of Prime Intellect

Lewis Tunstall

Research Engineer at Hugging Face

Shawn Wang

Editor of Latent Space

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

This repository provides the dataset and methodology behind InstructGPT, a language model trained to follow user instructions more effectively. It addresses the alignment problem in large language models, offering a path to improve helpfulness, truthfulness, and reduce toxicity for researchers and developers working with LLMs.

How It Works

The approach involves a two-stage fine-tuning process using human feedback. First, a GPT-3 model is supervised fine-tuned on demonstrations of desired behavior. Second, this model is further refined using reinforcement learning from human feedback (RLHF), where human labelers rank model outputs. This RLHF stage is key to aligning the model's behavior with user intent.

Quick Start & Requirements

This repository primarily contains data and documentation, not a runnable model. The methodology requires significant computational resources for training large language models and a substantial human labeling effort.

Highlighted Details

InstructGPT models (1.3B parameters) outperform GPT-3 (175B parameters) on human evaluations of instruction following.
Demonstrates improvements in truthfulness and reductions in toxic output generation.
Minimal performance regressions on public NLP datasets.
Includes labeling instructions for human evaluators and samples from model evaluations.

Maintenance & Community

This project is associated with OpenAI. No specific community channels or roadmap are detailed in the README.

Licensing & Compatibility

The README does not specify a license. The content is likely subject to OpenAI's terms of service and intellectual property rights.

Limitations & Caveats

The README notes that InstructGPT still makes simple mistakes. The methodology requires extensive human labeling and computational resources, making direct replication challenging without significant investment.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days