RL4LMs by allenai

RL library to fine-tune language models to human preferences

Created 3 years ago

2,374 stars

Top 19.0% on SourcePulse

View on GitHub

8 Experts Love This Project

Vincent Weisser

Cofounder of Prime Intellect

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Wing Lian

Founder of Axolotl AI

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

and 4 more!

Project Summary

RL4LMs is a modular Python library designed for fine-tuning large language models (LLMs) to align with human preferences using reinforcement learning. It provides customizable building blocks for various NLP tasks, enabling researchers and practitioners to optimize LLMs with arbitrary reward functions and datasets.

How It Works

The library implements on-policy RL algorithms (PPO, A2C, TRPO, NLPO) and actor-critic policies for both causal and sequence-to-sequence LLMs. It integrates a wide array of NLP metrics (lexical, semantic, task-specific) that can serve as reward functions. The framework uses a gym-style text generation environment, enhanced with stable-baselines3's SubProcVecEnv for parallel rollouts, and supports adaptive KL divergence control to maintain model stability.

Quick Start & Requirements

Install: pip install -e . after git clone https://github.com/allenai/RL4LMs.git
Dependencies: Python, optionally coreNLP libraries for specific metrics (e.g., SPICE).
Demo: https://rl4lms.apps.allenai.org/
Example Training: python scripts/training/train_text_generation.py --config_path scripts/training/task_configs/summarization/t5_ppo.yml

Highlighted Details

Benchmarked on 7 NLP tasks including summarization, QA, and dialogue generation.
Supports 20+ NLG metrics as reward functions (ROUGE, BLEU, BERTSCORE, BLEURT).
Implements PPO, A2C, TRPO, and a novel NLPO algorithm.
Offers extensive customizability for datasets, reward functions, metrics, and algorithms.

Maintenance & Community

Developed by Allen Institute for AI (AI2).
Active development with recent updates (v0.2.1) in Nov 2022.
Slack channel available for discussion and questions.

Licensing & Compatibility

License: MIT.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

The library is primarily focused on on-policy algorithms and may require significant configuration for custom RL setups. Some metric computations might depend on external libraries like coreNLP.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days