Discover and explore top open-source AI tools and projects—updated daily.
xiaomi-researchAutonomous driving trajectory prediction framework
New!
Top 73.9% on SourcePulse
OneVL is a Vision-Language-Action (VLA) framework for autonomous driving, designed to overcome the interpretability-speed trade-off inherent in trajectory prediction models. It achieves state-of-the-art accuracy with inference latency comparable to faster, non-interpretable models, making it suitable for real-time applications by providing both accurate predictions and explainable reasoning.
How It Works
The core innovation involves dual-modal auxiliary decoders that supervise compact latent tokens during training. A language auxiliary decoder reconstructs explicit Chain-of-Thought (CoT) reasoning from language latents, while a visual auxiliary decoder predicts future scene frames from visual latents, acting as a world model. At inference, these decoders are removed, and all latent tokens are prefilled in a single parallel pass. This approach achieves answer-only autoregressive (AR) prediction speeds while retaining interpretability, resolving performance degradation issues seen in prior latent CoT methods on driving tasks.
Quick Start & Requirements
pip install -r requirements.txt.torch==2.10.0, transformers==4.57.0, and omegaconf>=2.3.0.Highlighted Details
Maintenance & Community
No specific community channels (e.g., Discord, Slack) or details on active maintenance beyond the listed authors are provided in the README.
Licensing & Compatibility
Limitations & Caveats
Requires a specific version of the transformers library (>= 4.57.0). Visual explanation generation necessitates downloading external models for the Emu3.5 VQ-VAE. Performance is critically dependent on the staged training methodology.
3 days ago
Inactive