Orion by xiaomi-mlab

Autonomous driving framework using vision-language models

Created 10 months ago

540 stars

Top 58.8% on SourcePulse

Project Summary

ORION is a holistic end-to-end autonomous driving framework designed to improve decision-making in interactive, closed-loop scenarios by leveraging Vision-Language Models (VLMs). It addresses the gap between semantic reasoning and numerical trajectory output, targeting researchers and developers in autonomous driving who require robust performance in complex driving environments. ORION offers significant improvements in closed-loop evaluation metrics compared to existing state-of-the-art methods.

How It Works

ORION employs a unique architecture that combines a QT-Former for long-term context aggregation, a Large Language Model (LLM) for scenario reasoning, and a generative planner for precise trajectory prediction. This approach bridges the semantic reasoning and action spaces, enabling unified end-to-end optimization for both visual question-answering and planning tasks, leading to more accurate and context-aware driving decisions.

Quick Start & Requirements

Installation: Clone the repository, create a Conda environment, and install PyTorch with CUDA 11.8 support, followed by the project's dependencies.
Prerequisites: Python 3.8, PyTorch 2.4.1+cu118, torchvision 0.19.1+cu118, torchaudio 2.4.1. Requires pre-trained weights from OmniDrive and the Bench2drive dataset.
Hardware: Recommended NVIDIA A100 or GPU with >32GB VRAM for FP32 inference; >17GB VRAM for FP16 inference.
Links: Bench2drive dataset preparation, CARLA setup, Evaluation tools.

Highlighted Details

Achieves 77.74 Driving Score (DS) and 54.62% Success Rate (SR) on Bench2Drive, outperforming SOTA by 14.28 DS and 19.61% SR.
Supports ORION inference, open-loop, and close-loop evaluation.
Offers both FP32 and FP16 inference modes.
Provides qualitative visualization and analysis comparing ORION with other methods.

Maintenance & Community

The project is led by researchers from Huazhong University of Science & Technology and Xiaomi EV. The ArXiv paper was released on March 26, 2025, and inference code/checkpoints on April 10, 2025.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is currently in its early stages, with training framework and Chat-B2D dataset support not yet implemented. The setup for closed-loop evaluation requires careful configuration of CARLA and evaluation scripts.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

28 stars in the last 30 days