WAM-Diff  by fudan-generative-vision

Masked diffusion VLA framework for autonomous driving

Created 4 months ago
267 stars

Top 95.9% on SourcePulse

GitHubView on GitHub
Project Summary

WAM-Diff presents a Visual-Language (VLA) framework designed to advance autonomous driving capabilities. It addresses the complex task of perception and decision-making by integrating masked diffusion models with Mixture of Experts (MoE) and online reinforcement learning. This approach targets researchers and engineers in the autonomous driving sector, aiming to provide enhanced predictive accuracy and navigation robustness through a unified system.

How It Works

The framework employs a masked diffusion model to generate driving actions and waypoints, conditioned on visual inputs and navigation directives. It incorporates a Mixture of Experts (MoE) architecture to efficiently process diverse driving scenarios and utilizes online reinforcement learning to continuously refine its decision-making policies based on real-time feedback, thereby fostering more adaptive and reliable autonomous driving systems.

Quick Start & Requirements

  • Primary install/run command: Clone the repository (git clone https://github.com/fudan-generative-vision/WAM-Diff), navigate into the directory, set up the environment using bash init_env.sh (Conda) or uv venv && uv sync (uv), and run the inference demo with bash inf.sh.
  • Non-default prerequisites: Requires downloading pretrained WAM-Diff and Siglip2 models from Hugging Face Hub to the ./model/ directory.
  • Links: Huggingface WAM-Diff, Huggingface Siglip2.

Highlighted Details

  • Features qualitative results on the NAVSIM benchmark.
  • Provides a quick inference demo for immediate testing.
  • Pretrained models are available on Hugging Face Hub.

Maintenance & Community

  • Roadmap: Key milestones include the release of inference code, SFT/inf code, and pretrained models (scheduled for completion by February 1, 2026). NAVSIM evaluation code and RL code are marked as TBD.
  • Acknowledgements: The project acknowledges contributions from the LLaDA-V repositories.
  • No explicit community channels (e.g., Discord, Slack) or direct social media links are provided in the README.

Licensing & Compatibility

  • The README does not specify a software license. This absence creates ambiguity regarding usage rights, modification, and distribution, particularly for commercial applications.

Limitations & Caveats

  • Functionalities such as NAVSIM evaluation code and RL code are marked as TBD on the project roadmap, indicating they may not be fully implemented or released.
  • The lack of explicit licensing information is a significant caveat for adoption.
Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
95 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.