OpenDriveVLA by DriveVLA

End-to-end autonomous driving with a VLA model

Created 9 months ago

562 stars

Top 57.1% on SourcePulse

Project Summary

OpenDriveVLA aims to provide an end-to-end solution for autonomous driving using a large vision-language-action model. This project targets researchers and developers in the autonomous driving and AI fields, offering a unified framework for processing visual, linguistic, and action-based data.

How It Works

The project leverages a large vision-language-action model architecture, integrating components from established libraries like LLaVA-NeXT, Qwen2.5, and UniAD. This approach allows for a holistic understanding of the driving environment and the generation of appropriate driving actions, potentially simplifying the complex pipeline of traditional autonomous driving systems.

Quick Start & Requirements

Environment setup is available, with customized libraries for mmcv and mmdet3d to ensure compatibility with PyTorch 2.1.2, Transformers, and Deepspeed. Inference code, checkpoints, and training code are planned for release.

Highlighted Details

Paper available on arXiv (2503.23463).
Environment setup released, with customized third-party libraries.
Dependencies include PyTorch 2.1.2, Transformers, and Deepspeed.
Acknowledgements include LLaVA-NeXT, Qwen2.5, UniAD, mmcv, mmdet3d, GPT-Driver, Hint-AD, and TOD3Cap.

Maintenance & Community

The project is actively under development, with a roadmap including the release of model code, checkpoints, inference, and training code.

Licensing & Compatibility

The project's license is not specified in the provided README.

Limitations & Caveats

The release of core components such as inference code, checkpoints, and training code is still pending.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

5

Star History

39 stars in the last 30 days

Explore Similar Projects

Hybrid-VLA by PKU-HMI-Lab

Unified vision-language-action model

Created 10 months ago

Updated 3 months ago

Large-VLM-based-VLA-for-Robotic-Manipulation by JiuTian-VL

Advancing robotic manipulation with large Vision-Language-Action models

Created 7 months ago

Updated 3 weeks ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

vla0 by NVlabs

State-of-the-art Vision-Language-Action models via text-based action representation

Created 2 months ago

Updated 2 days ago

Impromptu-VLA by ahydchh

Vision-language-action models for driving

Created 8 months ago

Updated 2 months ago

molmoact by allenai

Multimodal model for spatial action reasoning

Created 5 months ago

Updated 1 month ago

SpatialVLA by SpatialVLA

Vision-language-action model for robot control, trained on real robot episodes

Created 11 months ago

Updated 6 months ago

Senna by hustvl

Autonomous driving research paper integrating vision-language models

Created 1 year ago

Updated 1 year ago

CogACT by microsoft

Vision-language-action model for robotic manipulation

Created 1 year ago

Updated 2 months ago

simlingo by RenzKa

Vision-only autonomous driving with language-action alignment

Created 10 months ago

Updated 4 months ago

Rex-Omni by IDEA-Research

Multimodal LLM for versatile visual perception via next-point prediction

Created 3 months ago

Updated 1 day ago

LMDrive by opendilab

Autonomous driving framework using LLMs for closed-loop control

Created 2 years ago

Updated 9 months ago

ICCV2025-Papers-with-Code by amusi

ICCV 2025 papers and code collection

Created 4 years ago

Updated 6 months ago

Feedback? Help us improve.