dexbotic by dexmal

Vision-Language-Action toolbox for embodied intelligence research

Created 2 months ago

648 stars

Top 51.6% on SourcePulse

Project Summary

Dexbotic is an open-source toolbox designed to streamline Vision-Language-Action (VLA) research for professionals in the embodied intelligence field. It offers a unified, modular framework supporting multiple mainstream VLA policies and LLM interfaces, simplifying the reproduction of complex robotic tasks. The project provides powerful pre-trained foundation models that yield significant performance improvements across various simulators and real-world robotic applications, with continuous updates planned for new models.

How It Works

Dexbotic employs a unified modular VLA framework that integrates embodied manipulation and navigation, compatible with open-source LLM interfaces. Its core advantage lies in its experiment-centric development approach, utilizing a "layered configuration + factory registration + entry dispatch" pattern. This design allows for high flexibility and extensibility, enabling users to easily modify configurations, swap models, or add new tasks by altering experimental scripts, while maintaining system stability.

Quick Start & Requirements

Primary Install: Docker is recommended for a consistent environment. Conda installation is also supported.
Prerequisites:
- Docker: Ubuntu 20.04/22.04, NVIDIA GPU (RTX 4090/A100/H100), NVIDIA Docker.
- Conda: Ubuntu 20.04/22.04, NVIDIA GPU (RTX 4090/A100/H100), CUDA 11.8 (tested), Anaconda, Python 3.10.
Links:
- Repository: https://github.com/Dexmal/dexbotic.git
- FlashAttention Docs: https://github.com/Dao-AILab/flash-attention

Highlighted Details

Supports multiple leading embodied manipulation and navigation policies, including Pi0, OFT, CogACT, MemoryVLA, and MUVLA.
Offers powerful pre-trained foundation models (e.g., Dexbotic-Base, Dexbotic-CogACT, Dexbotic-Pi0) for enhanced performance.
Provides unified data formats and deployment scripts for diverse robot support, including UR5, Franka, and ALOHA.
Features cloud and local training capabilities, supporting platforms like Alibaba Cloud and consumer-grade GPUs (RTX 4090).
Extensive benchmark results demonstrate significant performance gains across Libero, CALVIN, Simpler-Env, ManiSkill2, and RoboTwin2.0.

Maintenance & Community

The project was released on "2025-10-20" (future date). No specific community channels (e.g., Discord, Slack), roadmap links, or notable contributor information are detailed in the provided README excerpt.

Licensing & Compatibility

The license type and any compatibility notes for commercial use or closed-source linking are not specified in the provided README content.

Limitations & Caveats

Installation of FlashAttention can be challenging and requires consulting its official documentation. Some models and policies listed in the project's open-source plan are marked as unavailable ('✖️'). Optimal training performance is recommended with 8 NVIDIA A100/H100 GPUs, though local training on consumer hardware is supported.

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

59 stars in the last 30 days