lm-human-preferences  by openai

Code for fine-tuning language models using human preferences

created 5 years ago
1,350 stars

Top 30.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code for fine-tuning language models based on human preferences, as detailed in the paper "Fine-Tuning Language Models from Human Preferences." It targets researchers and engineers interested in aligning language model behavior with human feedback, enabling the training of reward models and subsequent policy fine-tuning.

How It Works

The project implements a reinforcement learning approach where a reward model is trained on human-labeled preference data. This reward model then guides the fine-tuning of a language model (policy) to generate outputs that maximize the predicted reward. The core advantage lies in directly optimizing for human-defined quality metrics, moving beyond traditional supervised learning objectives.

Quick Start & Requirements

  • Install: pipenv install
  • Prerequisites: Python 3.7.3, TensorFlow 1.13.1 (GPU version requires CUDA 10.0 and cuDNN 7.6.2), gsutil. Horovod is recommended for faster training.
  • Hardware: Tested on 8 V100 GPUs for training; development possible on macOS. CPU training is possible but very slow.
  • Data: Human labels are available at https://openaipublic.blob.core.windows.net/lm-human-preferences/labels.

Highlighted Details

  • Code supports training reward models from human labels and fine-tuning language models using these reward models.
  • Pre-trained models and human labels are released.
  • Supports distributed training via Horovod.
  • Includes scripts for training reward models, fine-tuning policies, and sampling from trained policies.

Maintenance & Community

The project is marked as "Archive" and no updates are expected. Pull requests are welcome.

Licensing & Compatibility

  • License: MIT
  • Compatibility: Suitable for commercial use.

Limitations & Caveats

The code is provided as-is and may no longer work due to migrated storage paths. It has only been tested with the smallest GPT-2 model (124M parameters) and Python 3.7.3.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
32 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.