OpenDriveVLA  by DriveVLA

End-to-end autonomous driving with a VLA model

Created 6 months ago
384 stars

Top 74.4% on SourcePulse

GitHubView on GitHub
Project Summary

OpenDriveVLA aims to provide an end-to-end solution for autonomous driving using a large vision-language-action model. This project targets researchers and developers in the autonomous driving and AI fields, offering a unified framework for processing visual, linguistic, and action-based data.

How It Works

The project leverages a large vision-language-action model architecture, integrating components from established libraries like LLaVA-NeXT, Qwen2.5, and UniAD. This approach allows for a holistic understanding of the driving environment and the generation of appropriate driving actions, potentially simplifying the complex pipeline of traditional autonomous driving systems.

Quick Start & Requirements

Environment setup is available, with customized libraries for mmcv and mmdet3d to ensure compatibility with PyTorch 2.1.2, Transformers, and Deepspeed. Inference code, checkpoints, and training code are planned for release.

Highlighted Details

  • Paper available on arXiv (2503.23463).
  • Environment setup released, with customized third-party libraries.
  • Dependencies include PyTorch 2.1.2, Transformers, and Deepspeed.
  • Acknowledgements include LLaVA-NeXT, Qwen2.5, UniAD, mmcv, mmdet3d, GPT-Driver, Hint-AD, and TOD3Cap.

Maintenance & Community

The project is actively under development, with a roadmap including the release of model code, checkpoints, inference, and training code.

Licensing & Compatibility

The project's license is not specified in the provided README.

Limitations & Caveats

The release of core components such as inference code, checkpoints, and training code is still pending.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
56 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.