Autonomous driving framework using vision-language models
Top 85.0% on sourcepulse
ORION is a holistic end-to-end autonomous driving framework designed to improve decision-making in interactive, closed-loop scenarios by leveraging Vision-Language Models (VLMs). It addresses the gap between semantic reasoning and numerical trajectory output, targeting researchers and developers in autonomous driving who require robust performance in complex driving environments. ORION offers significant improvements in closed-loop evaluation metrics compared to existing state-of-the-art methods.
How It Works
ORION employs a unique architecture that combines a QT-Former for long-term context aggregation, a Large Language Model (LLM) for scenario reasoning, and a generative planner for precise trajectory prediction. This approach bridges the semantic reasoning and action spaces, enabling unified end-to-end optimization for both visual question-answering and planning tasks, leading to more accurate and context-aware driving decisions.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is led by researchers from Huazhong University of Science & Technology and Xiaomi EV. The ArXiv paper was released on March 26, 2025, and inference code/checkpoints on April 10, 2025.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is currently in its early stages, with training framework and Chat-B2D dataset support not yet implemented. The setup for closed-loop evaluation requires careful configuration of CARLA and evaluation scripts.
1 month ago
Inactive