WholebodyVLA by OpenDriveLab

Vision-Language-Action for humanoid loco-manipulation

Created 7 months ago

523 stars

Top 59.5% on SourcePulse

Project Summary

WholeBodyVLA presents a unified Vision-Language-Action (VLA) framework designed for closed-loop humanoid loco-manipulation control in large-scale environments. It targets researchers and engineers in robotics, offering a novel method to learn unified latent actions from unlabeled egocentric videos, thereby enabling complex manipulation and locomotion tasks with precise, stable whole-body coordination.

How It Works

The framework employs a Latent Action Model (LAM) to derive unified latent actions directly from action-free egocentric videos. This learned representation is then integrated with a loco-manipulation-oriented (LMO) reinforcement learning policy. The system encodes visual input and language instructions into latent action tokens, which are subsequently decoded into dual-arm joint actions and locomotion commands, facilitating end-to-end control for sophisticated tasks. This approach allows for learning from diverse, unannotated video data and achieving robust coordination under disturbances.

Quick Start & Requirements

The project README explicitly states, "We currently have no concrete timeline for open-sourcing the codebase." Therefore, this repository currently serves as a collection of resources and references for the VLA on humanoids research community, rather than a deployable system.

Highlighted Details

A unified VLA framework for closed-loop humanoid loco-manipulation in large spaces.
Novel approach for learning unified latent actions from manipulation and locomotion videos without explicit action annotations.
A locomotion-oriented RL policy designed for precise and stable whole-body coordination, even under disturbances.

Maintenance & Community

The primary contact is Haoran Jiang (jianghaoran2024@gmail.com). The project lists several contributors, including Jin Chen, Yucheng Huang, Haoran Jiang, Yixuan Pan, Shijia Peng, Jialong Zeng, and Hai Zhang. The repository encourages discussion and collaboration within the VLA on humanoids research community. Links to the project's arXiv paper and project page are provided.

Licensing & Compatibility

No specific open-source license information is provided in the README.

Limitations & Caveats

The primary limitation is the current unavailability of the codebase, with no set timeline for its open-sourcing. The repository functions as a resource hub and reference collection for the research community.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

22 stars in the last 30 days