HE-Drive  by jmwang0117

Human-like end-to-end driving system

Created 1 year ago
253 stars

Top 99.3% on SourcePulse

GitHubView on GitHub
Project Summary

HE-Drive is an end-to-end autonomous driving system designed to generate human-like, temporally consistent, and comfortable driving trajectories. It targets researchers and developers in the autonomous driving field, offering a novel approach that significantly reduces collision rates and improves computational speed while prioritizing passenger comfort.

How It Works

HE-Drive employs a multi-stage approach: sparse perception extracts key 3D spatial representations, a DDPM-based motion planner generates diverse, multi-modal trajectories, and a Vision Language Model (VLM)-guided scorer selects the most comfortable option. This integration of VLMs for assessing driving style and comfort is a novel aspect, aiming to mimic human driving nuances.

Quick Start & Requirements

  • Installation: Requires setting up a Python 3.8 conda environment (hedrive), installing specific PyTorch (1.13.0+cu116) and torchvision versions, and running pip3 install -r requirement.txt. CUDA operations (deformable_aggregation) must be compiled.
  • Prerequisites: NuScenes dataset and its CAN bus expansion are required. The project also necessitates installing Ollama 0.4 and downloading the llama3.2-vision-11b model.
  • Hardware: The Llama 3.2 Vision 11B model requires a minimum of 8GB of VRAM.
  • Links: HE-Drive paper on arXiv.

Highlighted Details

  • Significantly reduces collision rates compared to existing solutions.
  • Achieves improved computational speed.
  • Prioritizes passenger comfort through VLM-guided trajectory scoring.
  • Leverages sparse perception for efficient 3D spatial representation.

Maintenance & Community

The project README does not provide details on community channels (e.g., Discord, Slack), active maintainers, or a roadmap. It encourages users to star the repository.

Licensing & Compatibility

The README does not specify a software license. This omission makes it impossible to determine compatibility for commercial use or closed-source integration without further clarification.

Limitations & Caveats

The primary adoption blocker is the significant VRAM requirement (8GB) for the Llama 3.2-Vision 11B model, which is integral to the system's VLM-guided scoring. Installation involves compiling custom CUDA operations, which can be complex and platform-dependent. The system relies on specific external datasets (NuScenes) and checkpoints (SparseDrive).

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.