DSVT is an efficient and deployment-friendly sparse backbone for large-scale point cloud processing, targeting 3D object detection and BEV map segmentation. It offers state-of-the-art performance with real-time inference speeds, making it suitable for autonomous driving applications.
How It Works
DSVT employs a Dynamic Sparse Voxel Transformer architecture. It partitions local regions within windows based on sparsity and processes them in parallel. A novel rotated set partitioning strategy enables cross-set connections by alternating partitioning configurations across self-attention layers, enhancing feature learning from sparse 3D data.
Quick Start & Requirements
- Installation: Refer to
INSTALL.md
. Dataset preparation follows OpenPCDet instructions.
- Training: Uses
tools/dist_train.sh
with configuration files. Supports multi-GPU and FP16 training.
- Testing: Uses
tools/dist_test.sh
.
- Dependencies: Python 3.8, PyTorch 1.13.1, ONNX 1.12.0, ONNX Runtime 1.10.0, CUDA 11.1, cuDNN 8.6.0, TensorRT 8.5.1.7.
- Resources: Training on Waymo dataset requires 8x A100 GPUs (40GB) for ~22.5h (FP32) or 8x RTX 3090 GPUs for ~5.5h (FP32, 20% data). FP16 can reduce memory and time.
- Links: INSTALL.md, OpenPCDet, Deployment
Highlighted Details
- Achieves SOTA performance on Waymo (78.2 mAPH L1, 72.1 mAPH L2 one-stage) and NuScenes datasets.
- TensorRT deployment yields real-time inference (27Hz) with 13.8ms latency on RTX3090, a significant improvement over PyTorch.
- Outperforms Sparse Convolution (SpConv) by +1.78 L2 mAPH with comparable latency, offering superior deployment ease.
- Successfully merged into OpenPCDet and supports multi-frame inputs without specific multi-frame design.
Maintenance & Community
- Active development with recent updates including GiT (ECCV2024 Oral).
- Codebase is clean, concise, and relies on minimal dependencies.
- Primary contact: Haiyang Wang (wanghaiyang6@stu.pku.edu.cn).
- Project is partially supported by National Key R&D Program of China and NSFC.
Licensing & Compatibility
- The repository does not explicitly state a license in the README. Code is released under the official implementation for CVPR 2023 paper.
- Waymo Dataset License Agreement restricts the sharing of pre-trained model weights.
Limitations & Caveats
- A bug related to position embeddings in
DSVTBlock
was noted (issue#50) but potentially fixed.
- FP16 training may occasionally report gradient NaN errors.
- Pre-trained models for Waymo are not provided due to dataset licensing.
- The initial voxel partitioning step, while optimized, consumes considerable time and is an area for potential further acceleration.