DSVT  by Haiyang-W

CVPR2023 paper implementation: sparse voxel transformer for 3D point clouds

Created 2 years ago
422 stars

Top 69.8% on SourcePulse

GitHubView on GitHub
Project Summary

DSVT is an efficient and deployment-friendly sparse backbone for large-scale point cloud processing, targeting 3D object detection and BEV map segmentation. It offers state-of-the-art performance with real-time inference speeds, making it suitable for autonomous driving applications.

How It Works

DSVT employs a Dynamic Sparse Voxel Transformer architecture. It partitions local regions within windows based on sparsity and processes them in parallel. A novel rotated set partitioning strategy enables cross-set connections by alternating partitioning configurations across self-attention layers, enhancing feature learning from sparse 3D data.

Quick Start & Requirements

  • Installation: Refer to INSTALL.md. Dataset preparation follows OpenPCDet instructions.
  • Training: Uses tools/dist_train.sh with configuration files. Supports multi-GPU and FP16 training.
  • Testing: Uses tools/dist_test.sh.
  • Dependencies: Python 3.8, PyTorch 1.13.1, ONNX 1.12.0, ONNX Runtime 1.10.0, CUDA 11.1, cuDNN 8.6.0, TensorRT 8.5.1.7.
  • Resources: Training on Waymo dataset requires 8x A100 GPUs (40GB) for ~22.5h (FP32) or 8x RTX 3090 GPUs for ~5.5h (FP32, 20% data). FP16 can reduce memory and time.
  • Links: INSTALL.md, OpenPCDet, Deployment

Highlighted Details

  • Achieves SOTA performance on Waymo (78.2 mAPH L1, 72.1 mAPH L2 one-stage) and NuScenes datasets.
  • TensorRT deployment yields real-time inference (27Hz) with 13.8ms latency on RTX3090, a significant improvement over PyTorch.
  • Outperforms Sparse Convolution (SpConv) by +1.78 L2 mAPH with comparable latency, offering superior deployment ease.
  • Successfully merged into OpenPCDet and supports multi-frame inputs without specific multi-frame design.

Maintenance & Community

  • Active development with recent updates including GiT (ECCV2024 Oral).
  • Codebase is clean, concise, and relies on minimal dependencies.
  • Primary contact: Haiyang Wang (wanghaiyang6@stu.pku.edu.cn).
  • Project is partially supported by National Key R&D Program of China and NSFC.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Code is released under the official implementation for CVPR 2023 paper.
  • Waymo Dataset License Agreement restricts the sharing of pre-trained model weights.

Limitations & Caveats

  • A bug related to position embeddings in DSVTBlock was noted (issue#50) but potentially fixed.
  • FP16 training may occasionally report gradient NaN errors.
  • Pre-trained models for Waymo are not provided due to dataset licensing.
  • The initial voxel partitioning step, while optimized, consumes considerable time and is an area for potential further acceleration.
Health Check
Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Tri Dao Tri Dao(Chief Scientist at Together AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
1 more.

oslo by tunib-ai

0%
309
Framework for large-scale transformer optimization
Created 3 years ago
Updated 3 years ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
13 more.

pytorch3d by facebookresearch

0.2%
10k
PyTorch3D is a PyTorch library for 3D deep learning research
Created 5 years ago
Updated 3 days ago
Feedback? Help us improve.