3D-Diffusion-Policy by YanjieZe

Generalizable visuomotor policy learning with 3D representations

Created 1 year ago

1,212 stars

Top 32.3% on SourcePulse

Project Summary

3D Diffusion Policy (DP3) offers a generalizable visuomotor policy learning framework for robotics, leveraging 3D visual representations and diffusion models. It targets researchers and practitioners in robotics and imitation learning, enabling effective control across diverse simulated and real-world tasks with practical inference speeds.

How It Works

DP3 integrates 3D visual data (depth and point clouds) with diffusion policies, allowing for learning from demonstrations. This approach captures rich spatial information, leading to improved generalization and performance compared to methods relying solely on 2D images or simpler representations. The use of diffusion models enables efficient generation of complex action sequences.

Quick Start & Requirements

Installation: Follow instructions in INSTALL.md.
Prerequisites: Ubuntu 20.04.01, Python, Franka Interface Control, Frankx, Allegro Hand Controller - Noetic. Real-world deployment requires specific hardware (Franka Robot, Allegro Hand, L515 Realsense Camera).
Data: Requires downloading expert policies for Adroit and DexArt, and assets for DexArt. Real-world data can also be used.
Setup: Training DP3 requires ~10GB GPU memory and ~3 hours on an Nvidia A40. A simplified version (simple_dp3.yaml) offers faster training (1-2 hours) and inference (25 FPS).
Links: Project Page, arXiv

Highlighted Details

Supports 57 tasks across Adroit, DexArt, and MetaWorld environments with 3D modality generation.
Provides scripts for demonstration generation, training, and evaluation, logging results with wandb.
Includes a visualizer for point clouds in headless environments.
Offers guidance for adapting DP3 to custom tasks by adding environment wrappers, runners, data loaders, and config files.

Maintenance & Community

The project is associated with Yanjie Ze. Several community extensions and applications are listed on arXiv, indicating active research interest. Contact Yanjie Ze for questions.

Licensing & Compatibility

Released under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Real-world deployment requires specific hardware, and the use of certain cameras (e.g., RealSense D435) may lead to performance issues due to point cloud quality. Generating demonstrations may require re-generation if initial results are poor, as imitation learning performance is sensitive to demonstration quality.

3D-Diffusion-Policy by YanjieZe

Explore Similar Projects

vla0 by NVlabs

Awesome-Generalist-Robots-via-Foundation-Models by JeffreyYH

DexGraspVLA by Psi-Robot

Awesome-Robotics-3D by zubair-irshad

GR00T-Dreams by NVIDIA

Awesome-Robotics-Manipulation by BaiShuanghao

opendr by opendr-eu

Awesome-Robotics-Foundation-Models by robotics-survey

RoboticsDiffusionTransformer by thu-ml

octo by octo-models

diffusion_policy by real-stanford

openpi by Physical-Intelligence