rllm  by rllm-org

Framework for post-training language agents via reinforcement learning

created 6 months ago
3,997 stars

Top 12.3% on SourcePulse

GitHubView on GitHub
Project Summary

rLLM is an open-source framework for enhancing language agents through reinforcement learning (RL). It enables users to build custom agents and environments, train them with RL, and deploy them for real-world applications, democratizing advanced LLM capabilities.

How It Works

rLLM leverages a modified fork of the verl RLHF library, integrating it with foundational models like Qwen and DeepSeek. The framework supports iterative scaling of RL algorithms, including GRPO, across increasing context lengths to improve agent performance on complex tasks like coding and mathematical reasoning. This approach aims to achieve state-of-the-art results with open-weight models.

Quick Start & Requirements

  • Installation: Clone the repository with --recurse-submodules, create a conda environment (python=3.10), activate it, and install dependencies using pip install -e ./verl and pip install -e ..
  • Prerequisites: Python 3.10, Conda.
  • Resources: Training experiments are powered by Hugging Face models and datasets. Specific hardware requirements for training are not detailed but are implied to be substantial given the nature of RL training.
  • Links: Documentation, Discord, Website, Hugging Face Collection.

Highlighted Details

  • Achieved 59% on SWEBench-Verified with DeepSWE (32B model), topping leaderboards for open-weight models.
  • DeepCoder (14B model) reached 60.6% Pass@1 on LiveCodeBench, matching top proprietary models.
  • DeepScaleR (1.5B model) achieved 43.1% Pass@1 on AIME by scaling context length.
  • Training scripts, hyperparameters, and extensive Wandb/evaluation logs are provided for reproducibility.

Maintenance & Community

The project is associated with Berkeley Sky Computing Lab, Berkeley AI Research, and Together AI. Community engagement is facilitated via Discord.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, the project is built upon open-source libraries and models, suggesting a permissive stance, but users should verify specific component licenses.

Limitations & Caveats

The README mentions potential data compression issues for some Wandb logs due to migration bugs, affecting the original step count for an 8k training run. Specific hardware requirements for training are not detailed.

Health Check
Last commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
18
Issues (30d)
22
Star History
247 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.