DriveLM  by OpenDriveLab

Driving benchmark for autonomous vehicles using language

created 2 years ago
1,114 stars

Top 35.0% on sourcepulse

GitHubView on GitHub
Project Summary

DriveLM introduces a novel benchmark and baseline for autonomous driving using Graph Visual Question Answering (GVQA) and language-driven decision-making. It targets researchers and engineers in autonomous driving and multimodal AI, aiming to bridge the gap between language understanding and real-world driving actions.

How It Works

DriveLM leverages a Visual Language Model (VLM) to process scene graphs and natural language queries, generating driving actions. This approach integrates perception, prediction, planning, and behavior modules through human-written reasoning logic, enabling end-to-end driving informed by language. The GVQA task specifically tests the model's ability to answer questions about driving scenarios based on structured scene information.

Quick Start & Requirements

  • Installation: Follow instructions in the "Getting Started" section, which involves preparing the DriveLM-nuScenes dataset and the challenge devkit.
  • Prerequisites: Requires the nuScenes dataset. Specific software dependencies are detailed within the repository.
  • Resources: The project page and Hugging Face demo provide further details and access to a test server.

Highlighted Details

  • ECCV 2024 Oral presentation.
  • Serves as a main track for the CVPR 2024 Autonomous Driving Challenge.
  • Includes DriveLM-Data, built upon nuScenes and CARLA, facilitating full-stack driving tasks with graph-structured dependencies.
  • Proposes DriveLM-Agent, a VLM-based baseline for joint Graph VQA and end-to-end driving.

Maintenance & Community

The project is associated with OpenDriveLab and has seen active development, including releases for datasets, baseline agents, and challenge infrastructure. Further details on community engagement and roadmaps are available via the project page.

Licensing & Compatibility

All assets and code are under the Apache 2.0 license. The language data is licensed under CC BY-NC-SA 4.0. Other datasets inherit their own licenses. The non-commercial clause on language data may restrict certain commercial applications.

Limitations & Caveats

The project is actively under development, with some components like inference code for DriveLM-CARLA still marked as TODO. The CC BY-NC-SA 4.0 license for language data imposes non-commercial restrictions.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
4
Star History
74 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.