DSVT by Haiyang-W

CVPR2023 paper implementation: sparse voxel transformer for 3D point clouds

Created 3 years ago

442 stars

Top 67.6% on SourcePulse

Project Summary

DSVT is an efficient and deployment-friendly sparse backbone for large-scale point cloud processing, targeting 3D object detection and BEV map segmentation. It offers state-of-the-art performance with real-time inference speeds, making it suitable for autonomous driving applications.

How It Works

DSVT employs a Dynamic Sparse Voxel Transformer architecture. It partitions local regions within windows based on sparsity and processes them in parallel. A novel rotated set partitioning strategy enables cross-set connections by alternating partitioning configurations across self-attention layers, enhancing feature learning from sparse 3D data.

Quick Start & Requirements

Installation: Refer to INSTALL.md. Dataset preparation follows OpenPCDet instructions.
Training: Uses tools/dist_train.sh with configuration files. Supports multi-GPU and FP16 training.
Testing: Uses tools/dist_test.sh.
Dependencies: Python 3.8, PyTorch 1.13.1, ONNX 1.12.0, ONNX Runtime 1.10.0, CUDA 11.1, cuDNN 8.6.0, TensorRT 8.5.1.7.
Resources: Training on Waymo dataset requires 8x A100 GPUs (40GB) for ~22.5h (FP32) or 8x RTX 3090 GPUs for ~5.5h (FP32, 20% data). FP16 can reduce memory and time.
Links: INSTALL.md, OpenPCDet, Deployment

Highlighted Details

Achieves SOTA performance on Waymo (78.2 mAPH L1, 72.1 mAPH L2 one-stage) and NuScenes datasets.
TensorRT deployment yields real-time inference (27Hz) with 13.8ms latency on RTX3090, a significant improvement over PyTorch.
Outperforms Sparse Convolution (SpConv) by +1.78 L2 mAPH with comparable latency, offering superior deployment ease.
Successfully merged into OpenPCDet and supports multi-frame inputs without specific multi-frame design.

Maintenance & Community

Active development with recent updates including GiT (ECCV2024 Oral).
Codebase is clean, concise, and relies on minimal dependencies.
Primary contact: Haiyang Wang (wanghaiyang6@stu.pku.edu.cn).
Project is partially supported by National Key R&D Program of China and NSFC.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Code is released under the official implementation for CVPR 2023 paper.
Waymo Dataset License Agreement restricts the sharing of pre-trained model weights.

Limitations & Caveats

A bug related to position embeddings in DSVTBlock was noted (issue#50) but potentially fixed.
FP16 training may occasionally report gradient NaN errors.
Pre-trained models for Waymo are not provided due to dataset licensing.
The initial voxel partitioning step, while optimized, consumes considerable time and is an area for potential further acceleration.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

7 stars in the last 30 days

Explore Similar Projects

PonderV2 by OpenGVLab

3D pre-training framework for efficient 3D representations

Created 2 years ago

Updated 3 months ago

ml-cubifyanything by apple

Scaling indoor 3D object detection and spatial understanding

Created 9 months ago

Updated 2 months ago

UniTR by Haiyang-W

Research paper for multi-modal 3D perception using unified transformer

Created 2 years ago

Updated 1 year ago

Starred by

Tri Dao

Tri Dao(Chief Scientist at Together AI),

Stas Bekman

Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and

1 more.

oslo by tunib-ai

Framework for large-scale transformer optimization

Created 4 years ago

Updated 3 years ago

Starred by

Saining Xie

Saining Xie(Professor at NYU).

Grendel-GS by nyu-systems

Research paper on distributed training system for 3D Gaussian Splatting

Created 2 years ago

Updated 3 months ago

openscene by pengsongyou

3D scene understanding research paper using open-vocabulary queries

Created 2 years ago

Updated 2 years ago

CVPR2023-3D-Occupancy-Prediction by CVPR2023-3D-Occupancy-Prediction

3D occupancy prediction benchmark for autonomous driving scene perception

Created 2 years ago

Updated 2 years ago

Starred by

Amit Jain

Amit Jain(Cofounder of Luma AI),

Chuan Li

Chuan Li(Chief Scientific Officer at Lambda), and

2 more.

NSVF by facebookresearch

Research paper implementation for neural sparse voxel fields (NSVF)

Created 6 years ago

Updated 2 years ago

nnDetection by MIC-DKFZ

Self-configuring framework for 3D medical object detection

Created 4 years ago

Updated 2 months ago

Pointcept by Pointcept

Point cloud perception codebase for research

Created 2 years ago

Updated 1 week ago

EfficientDet by xuannianz

Keras/TensorFlow implementation for EfficientDet object detection

Created 6 years ago

Updated 2 years ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

13 more.

pytorch3d by facebookresearch

PyTorch3D is a PyTorch library for 3D deep learning research

Created 6 years ago

Updated 3 days ago

Feedback? Help us improve.