LMDrive by opendilab

Autonomous driving framework using LLMs for closed-loop control

Created 2 years ago

841 stars

Top 42.4% on SourcePulse

Project Summary

LMDrive provides a closed-loop, end-to-end autonomous driving framework that leverages large language models (LLMs) to interpret multi-modal sensor data and natural language instructions. It is designed for researchers and developers in autonomous driving, enabling interaction with dynamic environments and human guidance through text.

How It Works

The framework integrates a vision encoder (e.g., ResNet50) to process sensor inputs (camera, LiDAR) into visual tokens. These tokens are then fed into a vision-language model (e.g., LLaVA, Vicuna) along with natural language instructions (navigation, human notices). The LLM generates driving actions, creating a language-guided, end-to-end driving policy. This approach allows for more intuitive control and adaptation to complex scenarios via natural language.

Quick Start & Requirements

Installation: Requires Anaconda, cloning the repository, and setting up Python 3.8 environments for vision_encoder and LAVIS. CARLA 0.9.10.1 must also be installed.
Prerequisites: NVIDIA GPUs (8x A100 80GB recommended for training), CUDA, CARLA simulator.
Setup: Detailed setup involves installing dependencies, downloading CARLA, and obtaining model weights. Training requires significant GPU resources and time (2-3 days per stage).
Resources: Official project page, paper, dataset, and model zoo links are provided.

Highlighted Details

Achieved state-of-the-art driving scores on the LangAuto benchmark.
Supports multiple LLM backbones (LLaVA, Vicuna, LLaMA) with varying performance.
Includes a comprehensive dataset of ~64K driving clips with multi-modal sensor data and instructions.
Provides scripts for data generation, preprocessing, parsing, training, and evaluation.

Maintenance & Community

The project is associated with the CVPR 2024 conference. Acknowledgements list several foundational repositories, including InterFuser, Transfuser, and LAVIS.

Licensing & Compatibility

All code within this repository is licensed under the Apache License 2.0. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

Training requires substantial computational resources (multiple high-end GPUs). The setup process involves several complex steps, including CARLA installation and data preparation. The framework's performance is dependent on the quality of the LLM and the training data.

LMDrive by opendilab

Explore Similar Projects

RoboVLMs by Robot-VLAs

awesome-knowledge-driven-AD by PJLab-ADG

Impromptu-VLA by ahydchh

llava-phi by xmoanvaf

Senna by hustvl

DriveLikeAHuman by PJLab-ADG

simlingo by RenzKa

OpenDriveVLA by DriveVLA

LLaVA-Plus-Codebase by LLaVA-VL

Olympus by yuanze-lin

pylot by erdos-project

EasyR1 by hiyouga