Relax by redai-infra

Asynchronous RL engine for scalable, omni-modal LLM post-training

Created 2 weeks ago

New!

333 stars

Top 82.4% on SourcePulse

Project Summary

Relax is a high-performance, asynchronous reinforcement learning engine designed for omni-modal post-training at scale, particularly for multimodal large language models. It addresses the need for a unified, service-oriented framework that decouples training and inference, enabling efficient end-to-end RL training across text, vision, and audio modalities. The framework targets AI researchers and engineers working with large-scale RL applications, offering significant benefits in scalability, flexibility, and throughput.

How It Works

Relax employs a six-layer service-oriented architecture built on Ray Serve, where each component (Actor, Rollout, Critic, etc.) is an independent deployment. This design facilitates native service-level elastic scheduling and fault recovery. Core to its operation is the TransferQueue system, which enables fully asynchronous execution across independent GPU clusters for different training phases (Rollout, Actor, Advantages). This decoupling, combined with Megatron-LM for training and SGLang for inference, allows for high throughput via streaming data exchange and configurable staleness, optimizing resource utilization and training speed.

Quick Start & Requirements

Primary install / run command: Recommended installation is via the official Docker image (docker pull relaxrl/relax:latest). After launching a container with appropriate GPU and IPC/network settings, clone the repo and install locally (pip install -e .).
Non-default prerequisites and dependencies: Requires GPUs with CUDA support. The Docker image bundles necessary versions of CUDA, PyTorch, Megatron-LM, SGLang, and Ray. Multi-node setup requires careful configuration of Ray and distributed storage.
Links:
- Full Documentation: redai-infra.github.io/Relax
- Architecture Guide: redai-infra.github.io/Relax/guides/architecture
- Quick Start Guide: redai-infra.github.io/Relax/guides/quick-start

Highlighted Details

Full Omni-Modal Training: Supports text, vision, and audio RL within a single framework, enabling end-to-end training for models like Qwen3-Omni.
Fully Async via TransferQueue: Rollout, Actor, and other components run on independent GPU clusters, exchanging data via streaming and configurable staleness for maximum throughput.
Agentic RL: Provides first-class support for multi-turn, closed-loop training with VLM multimodal context carry-over and flexible termination conditions.
Elastic Rollout Scaling: Dynamically scales inference engines mid-training via HTTP REST APIs, supporting both intra-cluster (ray_native) and cross-cluster (external) federation.
Megatron + SGLang Backends: Utilizes Megatron-LM for distributed training (TP/PP/CP/EP) and SGLang for high-throughput inference, with automatic weight conversion from HuggingFace.

Maintenance & Community

The project was open-sourced on April 15, 2026. Contributions are welcomed via a dedicated Contributing Guide. Specific community channels (e.g., Discord, Slack) or a public roadmap are not detailed in the README.

Licensing & Compatibility

This project is licensed under the Apache License 2.0. This license is permissive and generally compatible with commercial use and linking within closed-source projects.

Limitations & Caveats

The README does not explicitly state limitations such as alpha status or known bugs. The setup, particularly for multi-node and multi-GPU configurations, requires significant technical expertise with distributed systems (Ray) and deep learning frameworks. The asynchronous nature with configurable staleness may require careful tuning to balance on-policy accuracy with training throughput.

Health Check

Last Commit

7 hours ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

335 stars in the last 15 days