WAM-Diff by fudan-generative-vision

Masked diffusion VLA framework for autonomous driving

Created 6 months ago

289 stars

Top 90.9% on SourcePulse

Project Summary

WAM-Diff presents a Visual-Language (VLA) framework designed to advance autonomous driving capabilities. It addresses the complex task of perception and decision-making by integrating masked diffusion models with Mixture of Experts (MoE) and online reinforcement learning. This approach targets researchers and engineers in the autonomous driving sector, aiming to provide enhanced predictive accuracy and navigation robustness through a unified system.

How It Works

The framework employs a masked diffusion model to generate driving actions and waypoints, conditioned on visual inputs and navigation directives. It incorporates a Mixture of Experts (MoE) architecture to efficiently process diverse driving scenarios and utilizes online reinforcement learning to continuously refine its decision-making policies based on real-time feedback, thereby fostering more adaptive and reliable autonomous driving systems.

Quick Start & Requirements

Primary install/run command: Clone the repository (git clone https://github.com/fudan-generative-vision/WAM-Diff), navigate into the directory, set up the environment using bash init_env.sh (Conda) or uv venv && uv sync (uv), and run the inference demo with bash inf.sh.
Non-default prerequisites: Requires downloading pretrained WAM-Diff and Siglip2 models from Hugging Face Hub to the ./model/ directory.
Links: Huggingface WAM-Diff, Huggingface Siglip2.

Highlighted Details

Features qualitative results on the NAVSIM benchmark.
Provides a quick inference demo for immediate testing.
Pretrained models are available on Hugging Face Hub.

Maintenance & Community

Roadmap: Key milestones include the release of inference code, SFT/inf code, and pretrained models (scheduled for completion by February 1, 2026). NAVSIM evaluation code and RL code are marked as TBD.
Acknowledgements: The project acknowledges contributions from the LLaDA-V repositories.
No explicit community channels (e.g., Discord, Slack) or direct social media links are provided in the README.

Licensing & Compatibility

The README does not specify a software license. This absence creates ambiguity regarding usage rights, modification, and distribution, particularly for commercial applications.

Limitations & Caveats

Functionalities such as NAVSIM evaluation code and RL code are marked as TBD on the project roadmap, indicating they may not be fully implemented or released.
The lack of explicit licensing information is a significant caveat for adoption.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

7 stars in the last 30 days

Explore Similar Projects

Hybrid-VLA by PKU-HMI-Lab

Unified vision-language-action model

Created 1 year ago

Updated 7 months ago

Awesome-VLA-Papers by Psi-Robot

Vision-Language-Action (VLA) research paper compilation

Created 1 year ago

Updated 10 months ago

awesome-knowledge-driven-AD by PJLab-ADG

Curated list of resources for knowledge-driven autonomous driving

Created 2 years ago

Updated 1 year ago

World-Models-Autonomous-Driving-Survey by HaoranZhuExplorer

Curated list of world models for autonomous driving research

Created 2 years ago

Updated 5 months ago

DriveDreamer by JeffWang987

Research paper for autonomous driving world model

Created 2 years ago

Updated 1 year ago

onevl by xiaomi-research

Autonomous driving trajectory prediction framework

Created 3 weeks ago

Updated 3 days ago

AutoVLA by ucla-mobility

Vision-Language-Action model for end-to-end autonomous driving

Created 11 months ago

Updated 3 months ago

Senna by hustvl

Autonomous driving research paper integrating vision-language models

Created 1 year ago

Updated 2 months ago

simlingo by RenzKa

Vision-only autonomous driving with language-action alignment

Created 1 year ago

Updated 9 months ago

recogdrive by xiaomi-research

Cognitive reinforcement learning for end-to-end autonomous driving

Created 11 months ago

Updated 2 months ago

Orion by xiaomi-mlab

Autonomous driving framework using vision-language models

Created 1 year ago

Updated 5 months ago

OpenDriveVLA by DriveVLA

End-to-end autonomous driving with a VLA model

Created 1 year ago

Updated 3 months ago

Feedback? Help us improve.