DSVT  by Haiyang-W

CVPR2023 paper implementation: sparse voxel transformer for 3D point clouds

created 2 years ago
420 stars

Top 71.0% on sourcepulse

GitHubView on GitHub
Project Summary

DSVT is an efficient and deployment-friendly sparse backbone for large-scale point cloud processing, targeting 3D object detection and BEV map segmentation. It offers state-of-the-art performance with real-time inference speeds, making it suitable for autonomous driving applications.

How It Works

DSVT employs a Dynamic Sparse Voxel Transformer architecture. It partitions local regions within windows based on sparsity and processes them in parallel. A novel rotated set partitioning strategy enables cross-set connections by alternating partitioning configurations across self-attention layers, enhancing feature learning from sparse 3D data.

Quick Start & Requirements

  • Installation: Refer to INSTALL.md. Dataset preparation follows OpenPCDet instructions.
  • Training: Uses tools/dist_train.sh with configuration files. Supports multi-GPU and FP16 training.
  • Testing: Uses tools/dist_test.sh.
  • Dependencies: Python 3.8, PyTorch 1.13.1, ONNX 1.12.0, ONNX Runtime 1.10.0, CUDA 11.1, cuDNN 8.6.0, TensorRT 8.5.1.7.
  • Resources: Training on Waymo dataset requires 8x A100 GPUs (40GB) for ~22.5h (FP32) or 8x RTX 3090 GPUs for ~5.5h (FP32, 20% data). FP16 can reduce memory and time.
  • Links: INSTALL.md, OpenPCDet, Deployment

Highlighted Details

  • Achieves SOTA performance on Waymo (78.2 mAPH L1, 72.1 mAPH L2 one-stage) and NuScenes datasets.
  • TensorRT deployment yields real-time inference (27Hz) with 13.8ms latency on RTX3090, a significant improvement over PyTorch.
  • Outperforms Sparse Convolution (SpConv) by +1.78 L2 mAPH with comparable latency, offering superior deployment ease.
  • Successfully merged into OpenPCDet and supports multi-frame inputs without specific multi-frame design.

Maintenance & Community

  • Active development with recent updates including GiT (ECCV2024 Oral).
  • Codebase is clean, concise, and relies on minimal dependencies.
  • Primary contact: Haiyang Wang (wanghaiyang6@stu.pku.edu.cn).
  • Project is partially supported by National Key R&D Program of China and NSFC.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Code is released under the official implementation for CVPR 2023 paper.
  • Waymo Dataset License Agreement restricts the sharing of pre-trained model weights.

Limitations & Caveats

  • A bug related to position embeddings in DSVTBlock was noted (issue#50) but potentially fixed.
  • FP16 training may occasionally report gradient NaN errors.
  • Pre-trained models for Waymo are not provided due to dataset licensing.
  • The initial voxel partitioning step, while optimized, consumes considerable time and is an area for potential further acceleration.
Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
13 stars in the last 90 days

Explore Similar Projects

Starred by Tri Dao Tri Dao(Chief Scientist at Together AI), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
1 more.

oslo by tunib-ai

0%
309
Framework for large-scale transformer optimization
created 3 years ago
updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jaret Burkett Jaret Burkett(Founder of Ostris), and
1 more.

nunchaku by nunchaku-tech

2.3%
3k
High-performance 4-bit diffusion model inference engine
created 9 months ago
updated 7 hours ago
Feedback? Help us improve.