Discover and explore top open-source AI tools and projects—updated daily.
RLWRLDVision-Language-Action model for dexterous manipulation
Top 94.6% on SourcePulse
RLDX-1 is a Vision-Language-Action (VLA) model designed for human-like dexterous manipulation in robotics. It targets researchers and engineers seeking advanced robotic control capabilities, offering enhanced motion awareness, long-term memory, and physical sensing beyond standard VLM models. The project provides a unified architecture and a robust training and inference pipeline for developing more capable and adaptable robotic agents.
How It Works
RLDX-1 employs a novel Multi-Stream Action Transformer (MSAT) architecture, an extension of MM-DiT, which dedicates separate streams for cognition, physics, and action, coupled by joint self-attention. This design enables crucial functional capabilities: motion awareness through multi-frame observations and a motion module, long-term memory via a dedicated memory module fusing historical and current features, and physical sensing by integrating tactile and torque data into the physics stream. A synthetic-augmented, three-stage training pipeline (pre-training, mid-training, post-training) enhances generalization and task adaptation.
Quick Start & Requirements
Installation involves cloning the repository, setting up the Python 3.10 environment with uv, and installing the package.
git clone https://github.com/RLWRLD/RLDX-1.git
cd RLDX-1
uv sync --python 3.10
uv pip install -e .
docs/ directory, covering installation, architecture, training, and inference. Key links include the Paper and Project Page.Highlighted Details
Maintenance & Community
The project explicitly states that external pull requests are not accepted. Users encountering bugs or having questions are directed to open issues on the GitHub repository for follow-up. The project builds upon NVIDIA GR00T N1.7, Qwen3-VL, and FLUX.
Licensing & Compatibility
The codebase is released under the Apache License 2.0. However, the model weights are distributed under the RLWRLD Model License v1.0, which is a non-commercial license requiring attribution and share-alike terms. This restricts the use of pre-trained and mid-trained checkpoints to non-commercial applications.
Limitations & Caveats
The non-commercial license for model weights is a significant restriction for potential adopters. Furthermore, the fullgraph inference optimization mode is tuned for RTX 5090 (sm_120) architectures, and users with different GPU architectures may need to rely on the submodule compile mode for optimal results. External contributions to the codebase are not accepted.
1 week ago
Inactive
NVIDIA