RL4LMs  by allenai

RL library to fine-tune language models to human preferences

created 3 years ago
2,332 stars

Top 20.0% on sourcepulse

GitHubView on GitHub
Project Summary

RL4LMs is a modular Python library designed for fine-tuning large language models (LLMs) to align with human preferences using reinforcement learning. It provides customizable building blocks for various NLP tasks, enabling researchers and practitioners to optimize LLMs with arbitrary reward functions and datasets.

How It Works

The library implements on-policy RL algorithms (PPO, A2C, TRPO, NLPO) and actor-critic policies for both causal and sequence-to-sequence LLMs. It integrates a wide array of NLP metrics (lexical, semantic, task-specific) that can serve as reward functions. The framework uses a gym-style text generation environment, enhanced with stable-baselines3's SubProcVecEnv for parallel rollouts, and supports adaptive KL divergence control to maintain model stability.

Quick Start & Requirements

  • Install: pip install -e . after git clone https://github.com/allenai/RL4LMs.git
  • Dependencies: Python, optionally coreNLP libraries for specific metrics (e.g., SPICE).
  • Demo: https://rl4lms.apps.allenai.org/
  • Example Training: python scripts/training/train_text_generation.py --config_path scripts/training/task_configs/summarization/t5_ppo.yml

Highlighted Details

  • Benchmarked on 7 NLP tasks including summarization, QA, and dialogue generation.
  • Supports 20+ NLG metrics as reward functions (ROUGE, BLEU, BERTSCORE, BLEURT).
  • Implements PPO, A2C, TRPO, and a novel NLPO algorithm.
  • Offers extensive customizability for datasets, reward functions, metrics, and algorithms.

Maintenance & Community

  • Developed by Allen Institute for AI (AI2).
  • Active development with recent updates (v0.2.1) in Nov 2022.
  • Slack channel available for discussion and questions.

Licensing & Compatibility

  • License: MIT.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

The library is primarily focused on on-policy algorithms and may require significant configuration for custom RL setups. Some metric computations might depend on external libraries like coreNLP.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
34 stars in the last 90 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

LlamaGym by KhoomeiK

0.3%
1k
SDK for fine-tuning LLM agents with online reinforcement learning
created 1 year ago
updated 1 year ago
Feedback? Help us improve.