Discover and explore top open-source AI tools and projects—updated daily.
OpenDriveLabVision-Language-Action for humanoid loco-manipulation
Top 98.8% on SourcePulse
WholeBodyVLA presents a unified Vision-Language-Action (VLA) framework designed for closed-loop humanoid loco-manipulation control in large-scale environments. It targets researchers and engineers in robotics, offering a novel method to learn unified latent actions from unlabeled egocentric videos, thereby enabling complex manipulation and locomotion tasks with precise, stable whole-body coordination.
How It Works
The framework employs a Latent Action Model (LAM) to derive unified latent actions directly from action-free egocentric videos. This learned representation is then integrated with a loco-manipulation-oriented (LMO) reinforcement learning policy. The system encodes visual input and language instructions into latent action tokens, which are subsequently decoded into dual-arm joint actions and locomotion commands, facilitating end-to-end control for sophisticated tasks. This approach allows for learning from diverse, unannotated video data and achieving robust coordination under disturbances.
Quick Start & Requirements
The project README explicitly states, "We currently have no concrete timeline for open-sourcing the codebase." Therefore, this repository currently serves as a collection of resources and references for the VLA on humanoids research community, rather than a deployable system.
Highlighted Details
Maintenance & Community
The primary contact is Haoran Jiang (jianghaoran2024@gmail.com). The project lists several contributors, including Jin Chen, Yucheng Huang, Haoran Jiang, Yixuan Pan, Shijia Peng, Jialong Zeng, and Hai Zhang. The repository encourages discussion and collaboration within the VLA on humanoids research community. Links to the project's arXiv paper and project page are provided.
Licensing & Compatibility
No specific open-source license information is provided in the README.
Limitations & Caveats
The primary limitation is the current unavailability of the codebase, with no set timeline for its open-sourcing. The repository functions as a resource hub and reference collection for the research community.
2 weeks ago
Inactive
NVIDIA