Code for fine-tuning language models using human preferences
Top 30.4% on sourcepulse
This repository provides code for fine-tuning language models based on human preferences, as detailed in the paper "Fine-Tuning Language Models from Human Preferences." It targets researchers and engineers interested in aligning language model behavior with human feedback, enabling the training of reward models and subsequent policy fine-tuning.
How It Works
The project implements a reinforcement learning approach where a reward model is trained on human-labeled preference data. This reward model then guides the fine-tuning of a language model (policy) to generate outputs that maximize the predicted reward. The core advantage lies in directly optimizing for human-defined quality metrics, moving beyond traditional supervised learning objectives.
Quick Start & Requirements
pipenv install
gsutil
. Horovod is recommended for faster training.https://openaipublic.blob.core.windows.net/lm-human-preferences/labels
.Highlighted Details
Maintenance & Community
The project is marked as "Archive" and no updates are expected. Pull requests are welcome.
Licensing & Compatibility
Limitations & Caveats
The code is provided as-is and may no longer work due to migrated storage paths. It has only been tested with the smallest GPT-2 model (124M parameters) and Python 3.7.3.
2 years ago
Inactive