following-instructions-human-feedback  by openai

Research paper on aligning language models with user intent via human feedback

created 3 years ago
1,231 stars

Top 32.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the dataset and methodology behind InstructGPT, a language model trained to follow user instructions more effectively. It addresses the alignment problem in large language models, offering a path to improve helpfulness, truthfulness, and reduce toxicity for researchers and developers working with LLMs.

How It Works

The approach involves a two-stage fine-tuning process using human feedback. First, a GPT-3 model is supervised fine-tuned on demonstrations of desired behavior. Second, this model is further refined using reinforcement learning from human feedback (RLHF), where human labelers rank model outputs. This RLHF stage is key to aligning the model's behavior with user intent.

Quick Start & Requirements

This repository primarily contains data and documentation, not a runnable model. The methodology requires significant computational resources for training large language models and a substantial human labeling effort.

Highlighted Details

  • InstructGPT models (1.3B parameters) outperform GPT-3 (175B parameters) on human evaluations of instruction following.
  • Demonstrates improvements in truthfulness and reductions in toxic output generation.
  • Minimal performance regressions on public NLP datasets.
  • Includes labeling instructions for human evaluators and samples from model evaluations.

Maintenance & Community

This project is associated with OpenAI. No specific community channels or roadmap are detailed in the README.

Licensing & Compatibility

The README does not specify a license. The content is likely subject to OpenAI's terms of service and intellectual property rights.

Limitations & Caveats

The README notes that InstructGPT still makes simple mistakes. The methodology requires extensive human labeling and computational resources, making direct replication challenging without significant investment.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
8 more.

gpt-3 by openai

0.0%
16k
Research paper on large language model few-shot learning
created 5 years ago
updated 4 years ago
Feedback? Help us improve.