Impromptu-VLA by ahydchh

Vision-language-action models for driving

Created 8 months ago

366 stars

Top 77.0% on SourcePulse

Project Summary

Impromptu-VLA provides open-source data and code for training vision-language-action models for autonomous driving. It aims to improve driving policy robustness and safety by offering a curated dataset and benchmarks, targeting researchers and developers in the autonomous driving and AI fields.

How It Works

The project leverages a novel dataset designed to enhance the understanding of complex road interactions for AI driving agents. It integrates with established LLM serving (sglang) and fine-tuning (LLaMA-Factory) frameworks, utilizing vLLM for efficient inference. This approach allows for the training and evaluation of specialized driving models on par with or exceeding closed-source APIs.

Quick Start & Requirements

Environment Configuration: Requires setting up environments for sglang, LLaMA-Factory, and vLLM. A provided environment.yaml file lists all necessary Conda and pip packages.
Data Preparation: Involves organizing raw data, creating symbolic links for datasets like navsim, and running data generation scripts.
Training: Use llamafactory-cli train <yaml_path>.
Inference: Use python train/inference_scripts/sglang_infer.py.
Dependencies: Python, Conda, sglang, LLaMA-Factory, vLLM. Specific hardware requirements (e.g., GPUs) are implied by the underlying libraries.

Highlighted Details

Achieves state-of-the-art performance on open-loop trajectory prediction (nuScenes) and closed-loop driving simulation (NeuroNCAP), outperforming generalist VLMs and rivaling specialized driving models.
Offers pre-trained models for 3B and 7B parameter models, fine-tuned on various combinations of nuScenes and the Impromptu dataset.
Provides scripts for data organization, open-loop evaluation (nuScenes), and closed-loop evaluation (NeuroNCAP).
Includes video comparisons demonstrating improved driving behavior (e.g., collision avoidance) with the Impromptu dataset.

Maintenance & Community

The project originates from AIR, Tsinghua University, with contributions from Bosch Research. Links to Hugging Face Hub for models are provided. Further community or roadmap details are not explicitly mentioned in the README.

Licensing & Compatibility

The README does not explicitly state the license for the code or data. However, the use of libraries like LLaMA-Factory and vLLM suggests compatibility with common open-source AI development workflows. Users should verify specific licensing terms for all components.

Limitations & Caveats

The project focuses on specific autonomous driving benchmarks (nuScenes, NeuroNCAP) and may require significant effort to adapt to other domains or datasets. Detailed setup and data organization steps are crucial for successful implementation.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

13 stars in the last 30 days