HERMES  by LMD0311

Unified self-driving world model for 3D scene understanding and generation

Created 1 year ago
256 stars

Top 98.5% on SourcePulse

GitHubView on GitHub
Project Summary

HERMES presents a unified Driving World Model (DWM) designed to overcome the limitations of existing DWMs by integrating simultaneous 3D scene understanding with future scene generation. This project targets researchers and engineers in autonomous driving, offering a framework that not only predicts future driving scenarios but also interprets the current environment, aiming to enhance the safety and decision-making capabilities of self-driving systems.

How It Works

HERMES leverages a Bird's-Eye View (BEV) representation to consolidate multi-view spatial information while preserving crucial geometric relationships and inter-object interactions. A novel approach introduces "world queries," which integrate external world knowledge into the BEV features through causal attention mechanisms within a Large Language Model (LLM). This allows for contextual enrichment, enhancing both the understanding of the current scene and the generation of future scene evolutions.

Quick Start & Requirements

Detailed guides for environment setup, data preparation, weight downloading, training, inference, and evaluation are provided. Specific installation commands are not detailed in the README. The project utilizes datasets such as nuScenes and OmniDrive-nuScenes.

Highlighted Details

  • Achieves state-of-the-art performance on nuScenes and OmniDrive-nuScenes datasets.
  • Demonstrates a 32.4% reduction in generation error.
  • Improves understanding metrics, such as CIDEr, by 8.0%.
  • Code, pretrained weights, and processed data were open-sourced in July 2025.

Maintenance & Community

The project announced its acceptance to ICCV 2025 and the open-sourcing of its code and weights in July 2025. HERMES builds upon foundational work from BEVFormer v2, InternVL, UniPAD, OminiDrive, and DriveMonkey. No direct links to community channels like Discord or Slack are provided.

Licensing & Compatibility

The code is released under the Apache 2.0 license. This license is permissive and generally compatible with commercial use and linking within closed-source projects.

Limitations & Caveats

The project's "To Do" list indicates that DeepSpeed support is not yet released, which may impact distributed training capabilities. The project's reliance on multiple external frameworks may introduce complex dependencies.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.