PaLM-rlhf-pytorch by lucidrains

RLHF implementation on PaLM

Created 3 years ago

7,879 stars

Top 6.5% on SourcePulse

View on GitHub

10 Experts Love This Project

Pawel Garbacki

Cofounder of Fireworks AI

Wing Lian

Founder of Axolotl AI

Omar Sanseviero

DevRel at Google DeepMind

Jeffrey Quesnelle

Cofounder of Nous Research

and 6 more!

Project Summary

This repository provides a PyTorch implementation of Reinforcement Learning from Human Feedback (RLHF) applied to the PaLM architecture, aiming to replicate ChatGPT-like capabilities. It's targeted at researchers and developers interested in open-source LLM alignment and training.

How It Works

The project implements the RLHF pipeline, which involves training a base language model (PaLM), followed by a reward model trained on human preference data, and finally fine-tuning the language model using reinforcement learning (PPO) guided by the reward model. It leverages Flash Attention for efficiency and offers optional LoRA fine-tuning for the reward model.

Quick Start & Requirements

Install: pip install palm-rlhf-pytorch
Requirements: PyTorch, CUDA (for GPU acceleration), and potentially large datasets for training.
Usage examples are provided for training the base PaLM model, the reward model, and the RLHF trainer.

Highlighted Details

Implements RLHF for PaLM, similar to ChatGPT.
Includes optional LoRA fine-tuning for the reward model.
Leverages Flash Attention for improved performance.
Supports training a separate reward model and an RLHF trainer.

Maintenance & Community

The project is sponsored by Stability.ai and acknowledges contributions from Hugging Face and CarperAI. It mentions ongoing work and potential successors like Direct Preference Optimization. Community discussion channels are not explicitly linked.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is marked as "WIP" (work in progress). It explicitly states that no trained model is included, and significant compute resources are required for training. The effectiveness of LoRA fine-tuning for the reward model is noted as open research.

Health Check

Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

16 stars in the last 30 days