3D-Diffusion-Policy  by YanjieZe

Generalizable visuomotor policy learning with 3D representations

Created 1 year ago
1,046 stars

Top 36.0% on SourcePulse

GitHubView on GitHub
Project Summary

3D Diffusion Policy (DP3) offers a generalizable visuomotor policy learning framework for robotics, leveraging 3D visual representations and diffusion models. It targets researchers and practitioners in robotics and imitation learning, enabling effective control across diverse simulated and real-world tasks with practical inference speeds.

How It Works

DP3 integrates 3D visual data (depth and point clouds) with diffusion policies, allowing for learning from demonstrations. This approach captures rich spatial information, leading to improved generalization and performance compared to methods relying solely on 2D images or simpler representations. The use of diffusion models enables efficient generation of complex action sequences.

Quick Start & Requirements

  • Installation: Follow instructions in INSTALL.md.
  • Prerequisites: Ubuntu 20.04.01, Python, Franka Interface Control, Frankx, Allegro Hand Controller - Noetic. Real-world deployment requires specific hardware (Franka Robot, Allegro Hand, L515 Realsense Camera).
  • Data: Requires downloading expert policies for Adroit and DexArt, and assets for DexArt. Real-world data can also be used.
  • Setup: Training DP3 requires ~10GB GPU memory and ~3 hours on an Nvidia A40. A simplified version (simple_dp3.yaml) offers faster training (1-2 hours) and inference (25 FPS).
  • Links: Project Page, arXiv

Highlighted Details

  • Supports 57 tasks across Adroit, DexArt, and MetaWorld environments with 3D modality generation.
  • Provides scripts for demonstration generation, training, and evaluation, logging results with wandb.
  • Includes a visualizer for point clouds in headless environments.
  • Offers guidance for adapting DP3 to custom tasks by adding environment wrappers, runners, data loaders, and config files.

Maintenance & Community

The project is associated with Yanjie Ze. Several community extensions and applications are listed on arXiv, indicating active research interest. Contact Yanjie Ze for questions.

Licensing & Compatibility

Released under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Real-world deployment requires specific hardware, and the use of certain cameras (e.g., RealSense D435) may lead to performance issues due to point cloud quality. Generating demonstrations may require re-generation if initial results are poor, as imitation learning performance is sensitive to demonstration quality.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
40 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.