HE-Drive by jmwang0117

Human-like end-to-end driving system

Created 1 year ago

253 stars

Top 99.3% on SourcePulse

Project Summary

HE-Drive is an end-to-end autonomous driving system designed to generate human-like, temporally consistent, and comfortable driving trajectories. It targets researchers and developers in the autonomous driving field, offering a novel approach that significantly reduces collision rates and improves computational speed while prioritizing passenger comfort.

How It Works

HE-Drive employs a multi-stage approach: sparse perception extracts key 3D spatial representations, a DDPM-based motion planner generates diverse, multi-modal trajectories, and a Vision Language Model (VLM)-guided scorer selects the most comfortable option. This integration of VLMs for assessing driving style and comfort is a novel aspect, aiming to mimic human driving nuances.

Quick Start & Requirements

Installation: Requires setting up a Python 3.8 conda environment (hedrive), installing specific PyTorch (1.13.0+cu116) and torchvision versions, and running pip3 install -r requirement.txt. CUDA operations (deformable_aggregation) must be compiled.
Prerequisites: NuScenes dataset and its CAN bus expansion are required. The project also necessitates installing Ollama 0.4 and downloading the llama3.2-vision-11b model.
Hardware: The Llama 3.2 Vision 11B model requires a minimum of 8GB of VRAM.
Links: HE-Drive paper on arXiv.

Highlighted Details

Significantly reduces collision rates compared to existing solutions.
Achieves improved computational speed.
Prioritizes passenger comfort through VLM-guided trajectory scoring.
Leverages sparse perception for efficient 3D spatial representation.

Maintenance & Community

The project README does not provide details on community channels (e.g., Discord, Slack), active maintainers, or a roadmap. It encourages users to star the repository.

Licensing & Compatibility

The README does not specify a software license. This omission makes it impossible to determine compatibility for commercial use or closed-source integration without further clarification.

Limitations & Caveats

The primary adoption blocker is the significant VRAM requirement (8GB) for the Llama 3.2-Vision 11B model, which is integral to the system's VLM-guided scoring. Installation involves compiling custom CUDA operations, which can be complex and platform-dependent. The system relies on specific external datasets (NuScenes) and checkpoints (SparseDrive).

Health Check

Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days