DexGraspVLA by Psi-Robot

Vision-language-action framework for dexterous grasping

Created 10 months ago

461 stars

Top 65.7% on SourcePulse

Project Summary

DexGraspVLA is a vision-language-action framework designed for general dexterous grasping in complex, real-world scenarios. It targets researchers and engineers in robotics and AI, offering a robust solution for zero-shot grasping with high success rates, even with unseen objects and under challenging conditions. The framework excels at long-horizon tasks requiring complex reasoning, human disturbance handling, and failure recovery.

How It Works

DexGraspVLA employs a hierarchical approach. A pre-trained vision-language model (Qwen2.5-VL-72B-Instruct) acts as the high-level task planner, interpreting natural language commands and scene context. A diffusion-based policy serves as the low-level action controller, learning dexterous grasping movements from demonstrations. This combination leverages the generalization capabilities of foundation models with the precise control offered by diffusion models for imitation learning.

Quick Start & Requirements

Installation: Clone the repository, create and activate a conda environment (conda create -n dexgraspvla python=3.9, conda activate dexgraspvla), then run pip install -r requirements.txt. Install SAM and Cutie following their respective instructions.
Prerequisites: CUDA 12.6 is recommended. The planner requires access to Qwen2.5-VL-72B-Instruct, either via API (DashScope) or a self-hosted deployment (e.g., using vLLM on an 8-A800 GPU server).
Dataset: A small example dataset (grasp_demo_example.tar.gz) is provided for understanding data format and training.
Links: Paper, Project Page, Video

Highlighted Details

Achieves 90%+ success rate in zero-shot grasping in cluttered real-world environments.
Handles adversarial objects, human disturbance, and failure recovery.
Utilizes Qwen2.5-VL-72B-Instruct for planning and a diffusion policy for action control.
Provides a pre-trained controller checkpoint (dexgraspvla-controller-20250320) for immediate use.

Maintenance & Community

The project is associated with authors from institutions including Tsinghua University. Links to community channels are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README. The hardware-related code is withheld due to intellectual property constraints.

Limitations & Caveats

The README explicitly states that hardware-related code is not open-sourced due to IP constraints, which may limit full replication of the inference setup. The planner component relies on large, potentially costly, foundation models.

Health Check

Last Commit

5 months ago

Responsiveness

1+ week

Pull Requests (30d)

Issues (30d)

Star History

28 stars in the last 30 days