AutoVLA by ucla-mobility

Vision-Language-Action model for end-to-end autonomous driving

Created 8 months ago

407 stars

Top 71.7% on SourcePulse

Project Summary

Summary

AutoVLA introduces a Vision-Language-Action (VLA) model for end-to-end autonomous driving, integrating adaptive reasoning and reinforcement fine-tuning. It targets researchers and engineers, offering a unified autoregressive process for direct trajectory generation with improved planning performance and runtime efficiency.

How It Works

The model employs a unified autoregressive generative process combining chain-of-thought (CoT) reasoning and physical action tokenization. Supervised fine-tuning (SFT) enables dual thinking modes (fast/trajectory-only, slow/CoT-enhanced), while reinforcement fine-tuning (RFT) with Group Relative Policy Optimization (GRPO) optimizes planning and runtime efficiency by reducing unnecessary reasoning. This adaptive approach offers novel dynamic reasoning capabilities.

Quick Start & Requirements

Setup requires downloading large datasets (nuPlan, Waymo E2E, nuScenes, DriveLM annotations), creating a conda environment (environment.yml), installing the package (pip install -e .), and configuring Navsim with environment variables. Pretrained Qwen2.5-VL models are necessary. nuScenes preprocessing needs a separate conda environment due to dependency conflicts. GPU/CUDA support is implied. Key links: Website (https://autovla.github.io/), Paper (https://arxiv.org/abs/2506.13757).

Highlighted Details

Achieved high rankings in the Waymo Vision-based End-to-end Driving Challenge (May 2025).
Demonstrates competitive performance across nuPlan, nuScenes, Waymo, and CARLA benchmarks in open and closed-loop settings.
Features adaptive reasoning with dynamic switching between fast and slow thinking modes.

Maintenance & Community

Developed by UCLA researchers, with codebase released Feb 2026 and checkpoints planned for March 2026. Reasoning data availability is pending approval. No community channels or explicit roadmap are detailed beyond release plans.

Licensing & Compatibility

Released under an "Academic Software License," restricting commercial use. Compatibility with closed-source projects may be limited.

Limitations & Caveats

The "Academic Software License" restricts commercial adoption. Reasoning data is not yet available. Setup is complex, involving multiple datasets, environments, and potential dependency conflicts. The project's recent release (Feb 2026) indicates ongoing development.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

36 stars in the last 30 days