SkyRL  by NovaSky-AI

RL training pipeline for multi-turn tool use LLMs, optimized for real-world tasks

Created 4 months ago
870 stars

Top 41.3% on SourcePulse

GitHubView on GitHub
Project Summary

SkyRL-v0 provides an open reinforcement learning (RL) training pipeline for multi-turn tool-use large language models (LLMs), optimized for long-horizon, real-environment tasks. It targets researchers and developers working on complex agentic systems, offering a framework to improve LLM performance on tasks like software engineering and text-to-SQL.

How It Works

SkyRL-v0 is a fork of the VeRL framework, leveraging its asynchronous rollout capabilities for efficient training. This approach is designed to handle the complexities of long-horizon tasks by enabling agents to interact with real environments over extended periods. The integration of SGLang's async rollout feature is key to its performance, allowing for parallelized interaction and data collection, which is crucial for RL training.

Quick Start & Requirements

  • Installation: Clone the repository with submodules: git clone --recurse-submodules https://github.com/NovaSky-AI/SkyRL
  • Prerequisites: Refer to INSTALL.md for detailed instructions.
  • Resources: Training examples indicate requirements for multiple high-end GPUs (e.g., 8xH100 or 8xH200).
  • Links: Getting Started, SkyRL-SQL Blog Post, SkyRL-v0 Blog Post

Highlighted Details

  • SkyRL-Agent-14B-v0 achieves 21.6% on SWE-Bench-Verified, a 3.6% improvement over its base model.
  • SkyRL-SQL-7B outperforms GPT-4o and o4-mini on Spider benchmarks by up to 9.2% with multi-turn RL.
  • Training for SkyRL-Agent-14B-v0 took 20 hours on 8xH200 GPUs.
  • SkyRL-SQL-7B was trained on only 653 samples.

Maintenance & Community

  • The project is associated with Berkeley Sky Computing Lab.
  • Compute support from Anyscale, Databricks, NVIDIA, Lambda Labs, and AMD.
  • Key contributors from Tsinghua University and OpenBMB/ModelBest are acknowledged for SGLang integration.
  • Community links include Website, X, and Discord.

Licensing & Compatibility

  • The repository's license is not explicitly stated in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The codebase is a fork of VeRL and is undergoing refactoring to align with the VeRL main branch. Specific hardware requirements (multiple high-end GPUs) may present an adoption barrier. The licensing status requires clarification for commercial applications.

Health Check
Last Commit

17 hours ago

Responsiveness

1 day

Pull Requests (30d)
115
Issues (30d)
41
Star History
137 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
14 more.

verifiers by willccbb

3.1%
3k
RL for LLMs in verifiable environments
Created 7 months ago
Updated 20 hours ago
Feedback? Help us improve.