LMDrive  by opendilab

Autonomous driving framework using LLMs for closed-loop control

created 1 year ago
788 stars

Top 45.4% on sourcepulse

GitHubView on GitHub
Project Summary

LMDrive provides a closed-loop, end-to-end autonomous driving framework that leverages large language models (LLMs) to interpret multi-modal sensor data and natural language instructions. It is designed for researchers and developers in autonomous driving, enabling interaction with dynamic environments and human guidance through text.

How It Works

The framework integrates a vision encoder (e.g., ResNet50) to process sensor inputs (camera, LiDAR) into visual tokens. These tokens are then fed into a vision-language model (e.g., LLaVA, Vicuna) along with natural language instructions (navigation, human notices). The LLM generates driving actions, creating a language-guided, end-to-end driving policy. This approach allows for more intuitive control and adaptation to complex scenarios via natural language.

Quick Start & Requirements

  • Installation: Requires Anaconda, cloning the repository, and setting up Python 3.8 environments for vision_encoder and LAVIS. CARLA 0.9.10.1 must also be installed.
  • Prerequisites: NVIDIA GPUs (8x A100 80GB recommended for training), CUDA, CARLA simulator.
  • Setup: Detailed setup involves installing dependencies, downloading CARLA, and obtaining model weights. Training requires significant GPU resources and time (2-3 days per stage).
  • Resources: Official project page, paper, dataset, and model zoo links are provided.

Highlighted Details

  • Achieved state-of-the-art driving scores on the LangAuto benchmark.
  • Supports multiple LLM backbones (LLaVA, Vicuna, LLaMA) with varying performance.
  • Includes a comprehensive dataset of ~64K driving clips with multi-modal sensor data and instructions.
  • Provides scripts for data generation, preprocessing, parsing, training, and evaluation.

Maintenance & Community

The project is associated with the CVPR 2024 conference. Acknowledgements list several foundational repositories, including InterFuser, Transfuser, and LAVIS.

Licensing & Compatibility

All code within this repository is licensed under the Apache License 2.0. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

Training requires substantial computational resources (multiple high-end GPUs). The setup process involves several complex steps, including CARLA installation and data preparation. The framework's performance is dependent on the quality of the LLM and the training data.

Health Check
Last commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
44 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.