RDT2  by thu-ml

Foundation model for zero-shot robotic manipulation across embodiments

Created 3 weeks ago

New!

512 stars

Top 61.1% on SourcePulse

GitHubView on GitHub
Project Summary

RDT2 enables zero-shot cross-embodiment generalization for robotic manipulation tasks, allowing robots to execute instructions on unseen embodiments without retraining. This project targets researchers and engineers in robotics, offering a versatile foundation model for diverse robotic platforms, thereby reducing the need for extensive task-specific fine-tuning.

How It Works

The project features two primary models: RDT2-VQ, an auto-regressive Vision-Language-Action (VLA) model adapted from Qwen2.5-VL-7B-Instruct using Residual VQ for action tokenization, and RDT2-FM, an optimized RDT model serving as a low-latency action expert via a flow-matching objective. These models are trained on an extensive dataset comprising over 10,000 hours of human manipulation data across more than 100 indoor scenes, facilitating robust generalization capabilities.

Quick Start & Requirements

Installation involves cloning the repository, creating a Python 3.10 conda environment, and installing PyTorch (CUDA 12.8), flash-attn, and other dependencies via requirements.txt. Specific robot integrations require additional packages. Hardware demands an NVIDIA GPU with at least 16GB VRAM for inference (RTX 4090 recommended) and 32GB+ VRAM for fine-tuning (A100/H100 for RDT2-VQ LoRA/full). The system is tested on Ubuntu 24.04. Critical setup includes acquiring designated end effectors, cameras, and performing detailed

Health Check
Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
7
Star History
515 stars in the last 26 days

Explore Similar Projects

Feedback? Help us improve.