Discover and explore top open-source AI tools and projects—updated daily.
LMD0311Unified self-driving world model for 3D scene understanding and generation
Top 98.5% on SourcePulse
HERMES presents a unified Driving World Model (DWM) designed to overcome the limitations of existing DWMs by integrating simultaneous 3D scene understanding with future scene generation. This project targets researchers and engineers in autonomous driving, offering a framework that not only predicts future driving scenarios but also interprets the current environment, aiming to enhance the safety and decision-making capabilities of self-driving systems.
How It Works
HERMES leverages a Bird's-Eye View (BEV) representation to consolidate multi-view spatial information while preserving crucial geometric relationships and inter-object interactions. A novel approach introduces "world queries," which integrate external world knowledge into the BEV features through causal attention mechanisms within a Large Language Model (LLM). This allows for contextual enrichment, enhancing both the understanding of the current scene and the generation of future scene evolutions.
Quick Start & Requirements
Detailed guides for environment setup, data preparation, weight downloading, training, inference, and evaluation are provided. Specific installation commands are not detailed in the README. The project utilizes datasets such as nuScenes and OmniDrive-nuScenes.
Highlighted Details
Maintenance & Community
The project announced its acceptance to ICCV 2025 and the open-sourcing of its code and weights in July 2025. HERMES builds upon foundational work from BEVFormer v2, InternVL, UniPAD, OminiDrive, and DriveMonkey. No direct links to community channels like Discord or Slack are provided.
Licensing & Compatibility
The code is released under the Apache 2.0 license. This license is permissive and generally compatible with commercial use and linking within closed-source projects.
Limitations & Caveats
The project's "To Do" list indicates that DeepSpeed support is not yet released, which may impact distributed training capabilities. The project's reliance on multiple external frameworks may introduce complex dependencies.
2 weeks ago
Inactive