recogdrive by xiaomi-research

Cognitive reinforcement learning for end-to-end autonomous driving

Created 8 months ago

449 stars

Top 67.0% on SourcePulse

Project Summary

Summary

ReCogDrive offers a reinforced cognitive framework for end-to-end autonomous driving, tackling issues like slow inference and infeasible actions inherent in language-modeling approaches. Aimed at researchers and engineers, it unifies driving understanding and planning, enhancing safety, comfort, and achieving state-of-the-art performance.

How It Works

This framework integrates an autoregressive VLM with a diffusion planner, using a hierarchical data pipeline to instill human driving cognition. It injects VLM driving priors into the diffusion planner for efficient, continuous trajectory generation, overcoming language-action mismatches. A DiffGRPO stage further boosts safety and comfort.

Quick Start & Requirements

Setup involves cloning the ReCogDrive GitHub repository, downloading NAVSIM datasets, preparing the environment, and running provided training/evaluation scripts. Specific hardware or software dependencies are not detailed. Links to the GitHub repo, Hugging Face weights, and datasets are provided.

Highlighted Details

Achieves state-of-the-art on NAVSIM (up to 90.8 PDMS), Bench2Drive (138.18 Closed-loop Metric), DriveLM (67.30 GPT-Score), and DriveBench (56.71 Avg.).
Pretrained on 12 diverse open-source driving datasets, many re-annotated/filtered, plus 752k Navsim QA pairs.
Demonstrates strong scene comprehension and adaptability across various driving scenarios.

Maintenance & Community

The project lists multiple contributors, with Haiyang Sun as project leader and Xinggang Wang as corresponding author. Recent updates cover paper, model, and dataset releases, including bug fixes. No direct community channels or roadmaps are linked.

Licensing & Compatibility

The code's license is unspecified. However, the project explicitly restricts dataset usage and the framework itself to "academic research purposes, with no commercial applications involved." Users must also comply with original dataset licenses, limiting commercial viability.

Limitations & Caveats

The primary limitation is the explicit restriction to academic research, prohibiting commercial applications. Users must also adhere to the original licenses of the numerous pretraining datasets, potentially introducing further compatibility constraints.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

24 stars in the last 30 days