oat  by sail-sg

LLM online alignment framework for research

created 10 months ago
432 stars

Top 68.7% on SourcePulse

GitHubView on GitHub
Project Summary

OAT (Online Alignment Toolkit) is a research-friendly framework designed for efficient online alignment of Large Language Models (LLMs). It targets researchers and practitioners looking to experiment with and implement state-of-the-art online reinforcement learning and preference learning algorithms for LLMs, offering a simplified workflow and high computational efficiency.

How It Works

OAT employs a distributed Actor-Learner-Oracle architecture. Actors utilize vLLM for accelerated response sampling, while the Learner leverages DeepSpeed ZeRO strategies for memory efficiency. The Oracle component, which can be a model-based service (like Mosec) or simulated, provides preference, reward, or verification feedback, enabling dynamic batching and parallelism. This architecture allows for flexible querying of online feedback and real-time monitoring of learning curves, streamlining the experimental pipeline.

Quick Start & Requirements

  • Install via pip: pip install vllm==0.8.4 && pip install -U oat-llm
  • For development: git clone git@github.com:sail-sg/oat.git, cd oat, pip install vllm==0.8.4 && pip install -e .
  • Recommended Python version: 3.10.
  • Requires vLLM (version 0.8.4 specified).
  • Official examples and documentation are available via links in the README.

Highlighted Details

  • Implements cutting-edge online alignment algorithms including PPO, Dr.GRPO, online DPO, SimPO, IPO, SEA, APL, and XPO.
  • Achieves up to 2.5x computational efficiency compared to Hugging Face's TRL for online DPO.
  • Supports reinforcement learning with verifiable rewards (RLVR) and online exploration methods.
  • Offers flexible Oracle simulation, including LLM-as-a-judge via OpenAI API.

Maintenance & Community

The project is actively updated, with recent additions including Dr. GRPO and RLVR support. Key dependencies include vLLM, DeepSpeed, Mosec, launchpad, and OpenRLHF. No specific community channels (like Discord/Slack) are listed in the README.

Licensing & Compatibility

OAT is distributed under the Apache 2.0 license, which permits commercial use and linking with closed-source projects.

Limitations & Caveats

The specified vLLM version (0.8.4) might be a point of potential incompatibility with newer vLLM releases. The framework is research-oriented, and while it aims for ease of use, complex distributed setups may still require significant configuration.

Health Check
Last commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
4
Star History
26 stars in the last 30 days

Explore Similar Projects

Starred by Will Brown Will Brown(Research Lead at Prime Intellect), Junyang Lin Junyang Lin(Core Maintainer of Alibaba Qwen), and
4 more.

verifiers by willccbb

2.4%
2k
RL for LLMs in verifiable environments
created 6 months ago
updated 1 day ago
Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Shizhe Diao Shizhe Diao(Research Scientist at NVIDIA; Author of LMFlow), and
4 more.

simpleRL-reason by hkust-nlp

0.4%
4k
RL recipe for reasoning ability in models
created 6 months ago
updated 1 week ago
Feedback? Help us improve.