DriveDreamer2 by f1yfisher

LLM-enhanced world models for driving video generation

Created 2 years ago

265 stars

Top 96.3% on SourcePulse

Project Summary

Summary

DriveDreamer-2 addresses the generation of customized driving videos by integrating Large Language Models (LLMs) into world models for autonomous driving. It targets researchers and developers needing to create diverse, user-defined driving scenarios for training and evaluation. The key benefit is enabling the generation of specific, uncommon driving events through natural language prompts, thereby enhancing the training of perception systems and achieving superior video generation quality.

How It Works

The system first employs an LLM interface to translate user queries into agent trajectories. These trajectories then guide the generation of a High-Definition Map (HDMap) that enforces traffic regulations. Finally, a Unified Multi-View Model is utilized to ensure high temporal and spatial coherence across generated multi-view driving videos, facilitating the creation of complex, customized scenarios.

Quick Start & Requirements

Users are directed to download model weights and preprocessing files via a provided link ("HERE"). The project outlines sections for "Installation", "Prepare Dataset & Env", and "Train, Test, Visualization Demo". However, specific installation commands, detailed prerequisites (e.g., hardware, software versions), or estimated setup times are not elaborated in the provided README snippet.

Highlighted Details

State-of-the-Art Performance & Customization: DriveDreamer-2 achieves leading video generation quality with FID of 11.2 and FVD of 55.7 (30% and 50% relative improvements), and is the first world model to generate customized driving videos via LLM prompts, including uncommon events like abrupt cut-ins.
Enhanced Perception Training: Generated videos demonstrably improve the training of autonomous driving perception methods, such as 3D detection and tracking, by providing diverse and challenging data.

Maintenance & Community

Accepted for AAAI'25, the project released inference code and model weights on December 18, 2024. The team is actively working on releasing the full code and has also introduced related works like DriveDreamer4D and ReconDreamer.

Licensing & Compatibility

The provided README does not specify a license type, nor does it offer compatibility notes for commercial use or closed-source linking.

Limitations & Caveats

The project indicates that the team is actively working towards releasing the full code, suggesting that the current state may be incomplete or under active development. Detailed installation and environment setup instructions are not fully elaborated in the provided text.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days