Discover and explore top open-source AI tools and projects—updated daily.
Foundation model for zero-shot robotic manipulation across embodiments
New!
Top 61.1% on SourcePulse
RDT2 enables zero-shot cross-embodiment generalization for robotic manipulation tasks, allowing robots to execute instructions on unseen embodiments without retraining. This project targets researchers and engineers in robotics, offering a versatile foundation model for diverse robotic platforms, thereby reducing the need for extensive task-specific fine-tuning.
How It Works
The project features two primary models: RDT2-VQ, an auto-regressive Vision-Language-Action (VLA) model adapted from Qwen2.5-VL-7B-Instruct using Residual VQ for action tokenization, and RDT2-FM, an optimized RDT model serving as a low-latency action expert via a flow-matching objective. These models are trained on an extensive dataset comprising over 10,000 hours of human manipulation data across more than 100 indoor scenes, facilitating robust generalization capabilities.
Quick Start & Requirements
Installation involves cloning the repository, creating a Python 3.10 conda environment, and installing PyTorch (CUDA 12.8), flash-attn, and other dependencies via requirements.txt
. Specific robot integrations require additional packages. Hardware demands an NVIDIA GPU with at least 16GB VRAM for inference (RTX 4090 recommended) and 32GB+ VRAM for fine-tuning (A100/H100 for RDT2-VQ LoRA/full). The system is tested on Ubuntu 24.04. Critical setup includes acquiring designated end effectors, cameras, and performing detailed
4 days ago
Inactive