Autonomous driving research paper integrating vision-language models
Top 70.1% on sourcepulse
Senna integrates Large Vision-Language Models (VLMs) with end-to-end autonomous driving systems to enhance planning safety, robustness, and generalization. It targets researchers and developers in autonomous driving and AI, offering improved cross-scenario transferability and state-of-the-art planning performance.
How It Works
Senna bridges VLMs with traditional end-to-end driving models by leveraging a multi-stage fine-tuning process. It uses a VLM (specifically LLaVA-v1.6-34b) to generate scene descriptions and planning explanations, which are then used to fine-tune an end-to-end driving model. This approach aims to imbue the driving system with a more nuanced understanding of driving scenarios, leading to safer and more robust planning decisions.
Quick Start & Requirements
git clone git@github.com:hustvl/Senna.git
, conda create -n senna python=3.10 -y
, conda activate senna
, pip install -r requirements.txt
.Highlighted Details
Maintenance & Community
The project is associated with Huazhong University of Science and Technology and Horizon Robotics. The code and weights were released in December 2024.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README does not specify a license, which may impact commercial adoption. The data generation script relies on LLaVA-v1.6-34b, a large model that requires substantial computational resources.
7 months ago
1 week