Autonomous driving framework using LLMs for closed-loop control
Top 45.4% on sourcepulse
LMDrive provides a closed-loop, end-to-end autonomous driving framework that leverages large language models (LLMs) to interpret multi-modal sensor data and natural language instructions. It is designed for researchers and developers in autonomous driving, enabling interaction with dynamic environments and human guidance through text.
How It Works
The framework integrates a vision encoder (e.g., ResNet50) to process sensor inputs (camera, LiDAR) into visual tokens. These tokens are then fed into a vision-language model (e.g., LLaVA, Vicuna) along with natural language instructions (navigation, human notices). The LLM generates driving actions, creating a language-guided, end-to-end driving policy. This approach allows for more intuitive control and adaptation to complex scenarios via natural language.
Quick Start & Requirements
vision_encoder
and LAVIS
. CARLA 0.9.10.1 must also be installed.Highlighted Details
Maintenance & Community
The project is associated with the CVPR 2024 conference. Acknowledgements list several foundational repositories, including InterFuser, Transfuser, and LAVIS.
Licensing & Compatibility
All code within this repository is licensed under the Apache License 2.0. This license is permissive and generally compatible with commercial use and closed-source linking.
Limitations & Caveats
Training requires substantial computational resources (multiple high-end GPUs). The setup process involves several complex steps, including CARLA installation and data preparation. The framework's performance is dependent on the quality of the LLM and the training data.
3 months ago
1 week