Discover and explore top open-source AI tools and projects—updated daily.
Autoregressive action world model for robotics
Top 71.2% on SourcePulse
WorldVLA is an autoregressive action world model that unifies vision, language, and action understanding and generation for robotics. It targets researchers and developers in embodied AI and robotics, enabling tasks like generating robot actions from text and images, and predicting future states from actions.
How It Works
WorldVLA integrates a Vision-Language-Action (VLA) model for action generation and a world model for next-frame prediction within a single framework. It leverages the autoregressive capabilities of large language models, adapted for multimodal inputs (images and actions), to predict sequences of actions or future visual states. This unified approach aims to improve the coherence and efficiency of robot control and simulation.
Quick Start & Requirements
conda env create -f environment.yml
), cloning the LIBERO repository (git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
), and installing it (pip install -e .
).Highlighted Details
Maintenance & Community
The project was released on June 23, 2025, with code for the action model on the LIBERO benchmark. Future releases are planned for the world model and real-world experiments.
Licensing & Compatibility
Licensed under the Apache 2.0 license, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
The project is newly released, with the world model and real-world experiment code yet to be published. The current focus is on the LIBERO benchmark.
2 weeks ago
Inactive