trlx  by CarperAI

Distributed RLHF for LLMs

Created 3 years ago
4,738 stars

Top 10.4% on SourcePulse

GitHubView on GitHub
Project Summary

trlX is a distributed training framework for fine-tuning large language models using Reinforcement Learning from Human Feedback (RLHF). It supports models up to 20B parameters via Hugging Face Accelerate and larger models using NVIDIA NeMo, offering PPO and ILQL algorithms.

How It Works

trlX leverages Hugging Face Accelerate for efficient distributed training of models up to 20B parameters, and NVIDIA NeMo for scaling beyond that. It supports training with either a custom reward function or a reward-labeled dataset, abstracting the complexities of RL algorithms like PPO and ILQL.

Quick Start & Requirements

  • Install via pip install -e . after cloning the repository.
  • Requires PyTorch with CUDA 11.8 (pip install torch --extra-index-url https://download.pytorch.org/whl/cu118).
  • Supports distributed training via accelerate launch.
  • Documentation: 📖 Documentation
  • Colab Notebooks: Simulacra (GPT2, ILQL), Sentiment (GPT2, ILQL)

Highlighted Details

  • Supports fine-tuning causal and T5-based LLMs up to 20B parameters with Accelerate.
  • Integrates NVIDIA NeMo for efficient parallelism on models >20B parameters.
  • Implements Proximal Policy Optimization (PPO) and Implicit Language Q-Learning (ILQL).
  • Offers a human-in-the-loop data collection library called CHEESE.

Maintenance & Community

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.

Limitations & Caveats

  • The README does not specify a license, which may impact commercial use or closed-source integration.
  • Setup for NeMo-based training requires following separate NeMo README instructions.
Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
3 more.

ROLL by alibaba

2.3%
3k
RL library for large language models
Created 7 months ago
Updated 21 hours ago
Starred by Evan Hubinger Evan Hubinger(Head of Alignment Stress-Testing at Anthropic), Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), and
1 more.

rl by pytorch

0.3%
3k
PyTorch library for reinforcement learning research
Created 4 years ago
Updated 14 hours ago
Feedback? Help us improve.