oat by sail-sg

LLM online alignment framework for research

Created 1 year ago

572 stars

Top 56.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Vincent Weisser

Cofounder of Prime Intellect

Project Summary

OAT (Online Alignment Toolkit) is a research-friendly framework designed for efficient online alignment of Large Language Models (LLMs). It targets researchers and practitioners looking to experiment with and implement state-of-the-art online reinforcement learning and preference learning algorithms for LLMs, offering a simplified workflow and high computational efficiency.

How It Works

OAT employs a distributed Actor-Learner-Oracle architecture. Actors utilize vLLM for accelerated response sampling, while the Learner leverages DeepSpeed ZeRO strategies for memory efficiency. The Oracle component, which can be a model-based service (like Mosec) or simulated, provides preference, reward, or verification feedback, enabling dynamic batching and parallelism. This architecture allows for flexible querying of online feedback and real-time monitoring of learning curves, streamlining the experimental pipeline.

Quick Start & Requirements

Install via pip: pip install vllm==0.8.4 && pip install -U oat-llm
For development: git clone git@github.com:sail-sg/oat.git, cd oat, pip install vllm==0.8.4 && pip install -e .
Recommended Python version: 3.10.
Requires vLLM (version 0.8.4 specified).
Official examples and documentation are available via links in the README.

Highlighted Details

Implements cutting-edge online alignment algorithms including PPO, Dr.GRPO, online DPO, SimPO, IPO, SEA, APL, and XPO.
Achieves up to 2.5x computational efficiency compared to Hugging Face's TRL for online DPO.
Supports reinforcement learning with verifiable rewards (RLVR) and online exploration methods.
Offers flexible Oracle simulation, including LLM-as-a-judge via OpenAI API.

Maintenance & Community

The project is actively updated, with recent additions including Dr. GRPO and RLVR support. Key dependencies include vLLM, DeepSpeed, Mosec, launchpad, and OpenRLHF. No specific community channels (like Discord/Slack) are listed in the README.

Licensing & Compatibility

OAT is distributed under the Apache 2.0 license, which permits commercial use and linking with closed-source projects.

Limitations & Caveats

The specified vLLM version (0.8.4) might be a point of potential incompatibility with newer vLLM releases. The framework is research-oriented, and while it aims for ease of use, complex distributed setups may still require significant configuration.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

25 stars in the last 30 days