PaLM-rlhf-pytorch  by lucidrains

RLHF implementation on PaLM

created 2 years ago
7,855 stars

Top 6.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of Reinforcement Learning from Human Feedback (RLHF) applied to the PaLM architecture, aiming to replicate ChatGPT-like capabilities. It's targeted at researchers and developers interested in open-source LLM alignment and training.

How It Works

The project implements the RLHF pipeline, which involves training a base language model (PaLM), followed by a reward model trained on human preference data, and finally fine-tuning the language model using reinforcement learning (PPO) guided by the reward model. It leverages Flash Attention for efficiency and offers optional LoRA fine-tuning for the reward model.

Quick Start & Requirements

  • Install: pip install palm-rlhf-pytorch
  • Requirements: PyTorch, CUDA (for GPU acceleration), and potentially large datasets for training.
  • Usage examples are provided for training the base PaLM model, the reward model, and the RLHF trainer.

Highlighted Details

  • Implements RLHF for PaLM, similar to ChatGPT.
  • Includes optional LoRA fine-tuning for the reward model.
  • Leverages Flash Attention for improved performance.
  • Supports training a separate reward model and an RLHF trainer.

Maintenance & Community

The project is sponsored by Stability.ai and acknowledges contributions from Hugging Face and CarperAI. It mentions ongoing work and potential successors like Direct Preference Optimization. Community discussion channels are not explicitly linked.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is marked as "WIP" (work in progress). It explicitly states that no trained model is included, and significant compute resources are required for training. The effectiveness of LoRA fine-tuning for the reward model is noted as open research.

Health Check
Last commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
77 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.