DriveLM by OpenDriveLab

Driving benchmark for autonomous vehicles using language

Created 2 years ago

1,232 stars

Top 31.9% on SourcePulse

View on GitHub

2 Experts Love This Project

Binyuan Hui

Research Scientist at Alibaba Qwen

Junyang Lin

Core Maintainer at Alibaba Qwen

Project Summary

DriveLM introduces a novel benchmark and baseline for autonomous driving using Graph Visual Question Answering (GVQA) and language-driven decision-making. It targets researchers and engineers in autonomous driving and multimodal AI, aiming to bridge the gap between language understanding and real-world driving actions.

How It Works

DriveLM leverages a Visual Language Model (VLM) to process scene graphs and natural language queries, generating driving actions. This approach integrates perception, prediction, planning, and behavior modules through human-written reasoning logic, enabling end-to-end driving informed by language. The GVQA task specifically tests the model's ability to answer questions about driving scenarios based on structured scene information.

Quick Start & Requirements

Installation: Follow instructions in the "Getting Started" section, which involves preparing the DriveLM-nuScenes dataset and the challenge devkit.
Prerequisites: Requires the nuScenes dataset. Specific software dependencies are detailed within the repository.
Resources: The project page and Hugging Face demo provide further details and access to a test server.

Highlighted Details

ECCV 2024 Oral presentation.
Serves as a main track for the CVPR 2024 Autonomous Driving Challenge.
Includes DriveLM-Data, built upon nuScenes and CARLA, facilitating full-stack driving tasks with graph-structured dependencies.
Proposes DriveLM-Agent, a VLM-based baseline for joint Graph VQA and end-to-end driving.

Maintenance & Community

The project is associated with OpenDriveLab and has seen active development, including releases for datasets, baseline agents, and challenge infrastructure. Further details on community engagement and roadmaps are available via the project page.

Licensing & Compatibility

All assets and code are under the Apache 2.0 license. The language data is licensed under CC BY-NC-SA 4.0. Other datasets inherit their own licenses. The non-commercial clause on language data may restrict certain commercial applications.

Limitations & Caveats

The project is actively under development, with some components like inference code for DriveLM-CARLA still marked as TODO. The CC BY-NC-SA 4.0 license for language data imposes non-commercial restrictions.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

18 stars in the last 30 days