lm-human-preferences by openai

Code for fine-tuning language models using human preferences

Created 6 years ago

1,374 stars

Top 29.2% on SourcePulse

View on GitHub

6 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Eugene Yan

AI Scientist at AWS

Edward Sun

Research Scientist at Meta Superintelligence Lab

Chenlin Meng

Cofounder of Pika

and 2 more!

Project Summary

This repository provides code for fine-tuning language models based on human preferences, as detailed in the paper "Fine-Tuning Language Models from Human Preferences." It targets researchers and engineers interested in aligning language model behavior with human feedback, enabling the training of reward models and subsequent policy fine-tuning.

How It Works

The project implements a reinforcement learning approach where a reward model is trained on human-labeled preference data. This reward model then guides the fine-tuning of a language model (policy) to generate outputs that maximize the predicted reward. The core advantage lies in directly optimizing for human-defined quality metrics, moving beyond traditional supervised learning objectives.

Quick Start & Requirements

Install: pipenv install
Prerequisites: Python 3.7.3, TensorFlow 1.13.1 (GPU version requires CUDA 10.0 and cuDNN 7.6.2), gsutil. Horovod is recommended for faster training.
Hardware: Tested on 8 V100 GPUs for training; development possible on macOS. CPU training is possible but very slow.
Data: Human labels are available at https://openaipublic.blob.core.windows.net/lm-human-preferences/labels.

Highlighted Details

Code supports training reward models from human labels and fine-tuning language models using these reward models.
Pre-trained models and human labels are released.
Supports distributed training via Horovod.
Includes scripts for training reward models, fine-tuning policies, and sampling from trained policies.

Maintenance & Community

The project is marked as "Archive" and no updates are expected. Pull requests are welcome.

Licensing & Compatibility

License: MIT
Compatibility: Suitable for commercial use.

Limitations & Caveats

The code is provided as-is and may no longer work due to migrated storage paths. It has only been tested with the smallest GPT-2 model (124M parameters) and Python 3.7.3.

Health Check

Last Commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days