Discover and explore top open-source AI tools and projects—updated daily.
ucla-mobilityVision-Language-Action model for end-to-end autonomous driving
Top 71.7% on SourcePulse
Summary
AutoVLA introduces a Vision-Language-Action (VLA) model for end-to-end autonomous driving, integrating adaptive reasoning and reinforcement fine-tuning. It targets researchers and engineers, offering a unified autoregressive process for direct trajectory generation with improved planning performance and runtime efficiency.
How It Works
The model employs a unified autoregressive generative process combining chain-of-thought (CoT) reasoning and physical action tokenization. Supervised fine-tuning (SFT) enables dual thinking modes (fast/trajectory-only, slow/CoT-enhanced), while reinforcement fine-tuning (RFT) with Group Relative Policy Optimization (GRPO) optimizes planning and runtime efficiency by reducing unnecessary reasoning. This adaptive approach offers novel dynamic reasoning capabilities.
Quick Start & Requirements
Setup requires downloading large datasets (nuPlan, Waymo E2E, nuScenes, DriveLM annotations), creating a conda environment (environment.yml), installing the package (pip install -e .), and configuring Navsim with environment variables. Pretrained Qwen2.5-VL models are necessary. nuScenes preprocessing needs a separate conda environment due to dependency conflicts. GPU/CUDA support is implied. Key links: Website (https://autovla.github.io/), Paper (https://arxiv.org/abs/2506.13757).
Highlighted Details
Maintenance & Community
Developed by UCLA researchers, with codebase released Feb 2026 and checkpoints planned for March 2026. Reasoning data availability is pending approval. No community channels or explicit roadmap are detailed beyond release plans.
Licensing & Compatibility
Released under an "Academic Software License," restricting commercial use. Compatibility with closed-source projects may be limited.
Limitations & Caveats
The "Academic Software License" restricts commercial adoption. Reasoning data is not yet available. Setup is complex, involving multiple datasets, environments, and potential dependency conflicts. The project's recent release (Feb 2026) indicates ongoing development.
3 weeks ago
Inactive
AgentR1