RLHF implementation on PaLM
Top 6.8% on sourcepulse
This repository provides a PyTorch implementation of Reinforcement Learning from Human Feedback (RLHF) applied to the PaLM architecture, aiming to replicate ChatGPT-like capabilities. It's targeted at researchers and developers interested in open-source LLM alignment and training.
How It Works
The project implements the RLHF pipeline, which involves training a base language model (PaLM), followed by a reward model trained on human preference data, and finally fine-tuning the language model using reinforcement learning (PPO) guided by the reward model. It leverages Flash Attention for efficiency and offers optional LoRA fine-tuning for the reward model.
Quick Start & Requirements
pip install palm-rlhf-pytorch
Highlighted Details
Maintenance & Community
The project is sponsored by Stability.ai and acknowledges contributions from Hugging Face and CarperAI. It mentions ongoing work and potential successors like Direct Preference Optimization. Community discussion channels are not explicitly linked.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is marked as "WIP" (work in progress). It explicitly states that no trained model is included, and significant compute resources are required for training. The effectiveness of LoRA fine-tuning for the reward model is noted as open research.
3 months ago
Inactive