rllm by rllm-org

Framework for post-training language agents via reinforcement learning

Created 10 months ago

4,785 stars

Top 10.3% on SourcePulse

View on GitHub

7 Experts Love This Project

Cofounder of Fireworks AI

Vincent Weisser

Cofounder of Prime Intellect

and 3 more!

Project Summary

rLLM is an open-source framework for enhancing language agents through reinforcement learning (RL). It enables users to build custom agents and environments, train them with RL, and deploy them for real-world applications, democratizing advanced LLM capabilities.

How It Works

rLLM leverages a modified fork of the verl RLHF library, integrating it with foundational models like Qwen and DeepSeek. The framework supports iterative scaling of RL algorithms, including GRPO, across increasing context lengths to improve agent performance on complex tasks like coding and mathematical reasoning. This approach aims to achieve state-of-the-art results with open-weight models.

Quick Start & Requirements

Installation: Clone the repository with --recurse-submodules, create a conda environment (python=3.10), activate it, and install dependencies using pip install -e ./verl and pip install -e ..
Prerequisites: Python 3.10, Conda.
Resources: Training experiments are powered by Hugging Face models and datasets. Specific hardware requirements for training are not detailed but are implied to be substantial given the nature of RL training.
Links: Documentation, Discord, Website, Hugging Face Collection.

Highlighted Details

Achieved 59% on SWEBench-Verified with DeepSWE (32B model), topping leaderboards for open-weight models.
DeepCoder (14B model) reached 60.6% Pass@1 on LiveCodeBench, matching top proprietary models.
DeepScaleR (1.5B model) achieved 43.1% Pass@1 on AIME by scaling context length.
Training scripts, hyperparameters, and extensive Wandb/evaluation logs are provided for reproducibility.

Maintenance & Community

The project is associated with Berkeley Sky Computing Lab, Berkeley AI Research, and Together AI. Community engagement is facilitated via Discord.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, the project is built upon open-source libraries and models, suggesting a permissive stance, but users should verify specific component licenses.

Limitations & Caveats

The README mentions potential data compression issues for some Wandb logs due to migration bugs, affecting the original step count for an 8k training run. Specific hardware requirements for training are not detailed.

Health Check

Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

156 stars in the last 30 days