WorldVLA  by alibaba-damo-academy

Autoregressive action world model for robotics

Created 2 months ago
410 stars

Top 71.2% on SourcePulse

GitHubView on GitHub
Project Summary

WorldVLA is an autoregressive action world model that unifies vision, language, and action understanding and generation for robotics. It targets researchers and developers in embodied AI and robotics, enabling tasks like generating robot actions from text and images, and predicting future states from actions.

How It Works

WorldVLA integrates a Vision-Language-Action (VLA) model for action generation and a world model for next-frame prediction within a single framework. It leverages the autoregressive capabilities of large language models, adapted for multimodal inputs (images and actions), to predict sequences of actions or future visual states. This unified approach aims to improve the coherence and efficiency of robot control and simulation.

Quick Start & Requirements

  • Installation: Requires creating a Conda environment (conda env create -f environment.yml), cloning the LIBERO repository (git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git), and installing it (pip install -e .).
  • Prerequisites: LIBERO dataset, Chameleon model weights (tokenizer and starting point).
  • Resources: Training involves significant data preparation and computational resources.
  • Links: Hugging Face Checkpoints, arXiv Paper.

Highlighted Details

  • Achieves high success rates on the LIBERO benchmark for action generation (e.g., 96.2% for LIBERO-Object at 512x512 resolution).
  • Supports both 256x256 and 512x512 image resolutions.
  • Provides detailed scripts for data preparation, training, and evaluation.
  • Built upon foundational models like Lumina-mGPT, Chameleon, and OpenVLA.

Maintenance & Community

The project was released on June 23, 2025, with code for the action model on the LIBERO benchmark. Future releases are planned for the world model and real-world experiments.

Licensing & Compatibility

Licensed under the Apache 2.0 license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The project is newly released, with the world model and real-world experiment code yet to be published. The current focus is on the LIBERO benchmark.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
62 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.