SkyRL  by NovaSky-AI

RL training pipeline for multi-turn tool use LLMs, optimized for real-world tasks

Created 8 months ago
1,438 stars

Top 28.1% on SourcePulse

GitHubView on GitHub
Project Summary

SkyRL-v0 provides an open reinforcement learning (RL) training pipeline for multi-turn tool-use large language models (LLMs), optimized for long-horizon, real-environment tasks. It targets researchers and developers working on complex agentic systems, offering a framework to improve LLM performance on tasks like software engineering and text-to-SQL.

How It Works

SkyRL-v0 is a fork of the VeRL framework, leveraging its asynchronous rollout capabilities for efficient training. This approach is designed to handle the complexities of long-horizon tasks by enabling agents to interact with real environments over extended periods. The integration of SGLang's async rollout feature is key to its performance, allowing for parallelized interaction and data collection, which is crucial for RL training.

Quick Start & Requirements

  • Installation: Clone the repository with submodules: git clone --recurse-submodules https://github.com/NovaSky-AI/SkyRL
  • Prerequisites: Refer to INSTALL.md for detailed instructions.
  • Resources: Training examples indicate requirements for multiple high-end GPUs (e.g., 8xH100 or 8xH200).
  • Links: Getting Started, SkyRL-SQL Blog Post, SkyRL-v0 Blog Post

Highlighted Details

  • SkyRL-Agent-14B-v0 achieves 21.6% on SWE-Bench-Verified, a 3.6% improvement over its base model.
  • SkyRL-SQL-7B outperforms GPT-4o and o4-mini on Spider benchmarks by up to 9.2% with multi-turn RL.
  • Training for SkyRL-Agent-14B-v0 took 20 hours on 8xH200 GPUs.
  • SkyRL-SQL-7B was trained on only 653 samples.

Maintenance & Community

  • The project is associated with Berkeley Sky Computing Lab.
  • Compute support from Anyscale, Databricks, NVIDIA, Lambda Labs, and AMD.
  • Key contributors from Tsinghua University and OpenBMB/ModelBest are acknowledged for SGLang integration.
  • Community links include Website, X, and Discord.

Licensing & Compatibility

  • The repository's license is not explicitly stated in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The codebase is a fork of VeRL and is undergoing refactoring to align with the VeRL main branch. Specific hardware requirements (multiple high-end GPUs) may present an adoption barrier. The licensing status requires clarification for commercial applications.

Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
69
Issues (30d)
34
Star History
89 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
3 more.

ROLL by alibaba

2.3%
3k
RL library for large language models
Created 7 months ago
Updated 20 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Will Brown Will Brown(Research Lead at Prime Intellect), and
14 more.

verifiers by PrimeIntellect-ai

1.0%
4k
RL for LLMs in verifiable environments
Created 11 months ago
Updated 18 hours ago
Feedback? Help us improve.