Qwen-VLA  by QwenLM

Unified vision-language-action model for embodied AI

Created 2 weeks ago

New!

560 stars

Top 56.8% on SourcePulse

GitHubView on GitHub
Project Summary

Qwen-VLA introduces a unified generalist model for embodied AI tasks like manipulation and navigation. It targets robotics researchers and engineers, offering a single model that surpasses task-specific specialists across diverse platforms and environments via a novel, unified framework.

How It Works

Qwen-VLA integrates a Qwen3.5-4B vision-language backbone with a 1.15B DiT flow-matching action decoder. It unifies heterogeneous embodied data into a shared action-and-trajectory prediction space, enabling a single model to learn from diverse tasks and robot embodiments via embodiment-aware prompt conditioning, eliminating per-platform output heads. A progressive training recipe (action pretraining, multimodal continued pretraining, SFT, RL) bridges discrete tokens and continuous actions.

Quick Start & Requirements

Official information, a demo, and a technical report are available.

Highlighted Details

  • Generalist Performance: A single Qwen-VLA model matches or outperforms task-specific specialists across multiple simulation and real-world benchmarks.
  • Unified Framework: Manipulation, navigation, and trajectory prediction are handled within one shared action-and-trajectory prediction space.
  • Embodiment Agnosticism: Embodiment-aware prompt conditioning allows a single model to adapt to multiple robot platforms via text prompts.
  • OOD Generalization: Large-scale embodied pretraining yields robust out-of-distribution generalization in real-world deployments.
  • Real-World Validation: On ALOHA, pre-trained Qwen-VLA-aloha achieved an 83.6% average success rate, surpassing specialist models.

Maintenance & Community

Developed by the "Qwen Team." No specific community channels (e.g., Discord, Slack) or detailed roadmap information are provided in the README. The extensive author list suggests a significant research effort.

Licensing & Compatibility

No license information is specified in the provided README. This omission requires further investigation for commercial use or integration into closed-source projects.

Limitations & Caveats

The provided README does not explicitly state any limitations, unsupported platforms, or known bugs. The model is presented as a generalist solution achieving state-of-the-art performance across various benchmarks.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
10
Star History
563 stars in the last 15 days

Explore Similar Projects

Feedback? Help us improve.