Driving benchmark for autonomous vehicles using language
Top 35.0% on sourcepulse
DriveLM introduces a novel benchmark and baseline for autonomous driving using Graph Visual Question Answering (GVQA) and language-driven decision-making. It targets researchers and engineers in autonomous driving and multimodal AI, aiming to bridge the gap between language understanding and real-world driving actions.
How It Works
DriveLM leverages a Visual Language Model (VLM) to process scene graphs and natural language queries, generating driving actions. This approach integrates perception, prediction, planning, and behavior modules through human-written reasoning logic, enabling end-to-end driving informed by language. The GVQA task specifically tests the model's ability to answer questions about driving scenarios based on structured scene information.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is associated with OpenDriveLab and has seen active development, including releases for datasets, baseline agents, and challenge infrastructure. Further details on community engagement and roadmaps are available via the project page.
Licensing & Compatibility
All assets and code are under the Apache 2.0 license. The language data is licensed under CC BY-NC-SA 4.0. Other datasets inherit their own licenses. The non-commercial clause on language data may restrict certain commercial applications.
Limitations & Caveats
The project is actively under development, with some components like inference code for DriveLM-CARLA still marked as TODO. The CC BY-NC-SA 4.0 license for language data imposes non-commercial restrictions.
1 month ago
1 day