Discover and explore top open-source AI tools and projects—updated daily.
fudan-generative-visionMasked diffusion VLA framework for autonomous driving
Top 95.9% on SourcePulse
WAM-Diff presents a Visual-Language (VLA) framework designed to advance autonomous driving capabilities. It addresses the complex task of perception and decision-making by integrating masked diffusion models with Mixture of Experts (MoE) and online reinforcement learning. This approach targets researchers and engineers in the autonomous driving sector, aiming to provide enhanced predictive accuracy and navigation robustness through a unified system.
How It Works
The framework employs a masked diffusion model to generate driving actions and waypoints, conditioned on visual inputs and navigation directives. It incorporates a Mixture of Experts (MoE) architecture to efficiently process diverse driving scenarios and utilizes online reinforcement learning to continuously refine its decision-making policies based on real-time feedback, thereby fostering more adaptive and reliable autonomous driving systems.
Quick Start & Requirements
git clone https://github.com/fudan-generative-vision/WAM-Diff), navigate into the directory, set up the environment using bash init_env.sh (Conda) or uv venv && uv sync (uv), and run the inference demo with bash inf.sh../model/ directory.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 months ago
Inactive