PonderV2  by OpenGVLab

3D pre-training framework for efficient 3D representations

created 1 year ago
356 stars

Top 79.5% on sourcepulse

GitHubView on GitHub
Project Summary

PonderV2 is a universal pre-training framework for 3D foundation models, enabling efficient learning of 3D representations through differentiable neural rendering. It targets researchers and practitioners in 3D computer vision, offering a unified approach to bridge 2D and 3D data modalities.

How It Works

PonderV2 leverages differentiable neural rendering as a core mechanism for pre-training on point clouds. This approach allows the model to learn rich 3D representations by effectively bridging the gap between 2D and 3D data, facilitating a universal pre-training paradigm.

Quick Start & Requirements

  • Installation: Requires Ubuntu 18.04+, CUDA 11.3+, PyTorch 1.10.0+. Installation involves creating a conda environment and installing dependencies including pytorch-cluster, pytorch-scatter, pytorch-sparse, spconv, torch-geometric, opencv-python, open3d, and CLIP. Specific CUDA architecture compilation is needed for spconv and NeuS renderer.
  • Data Preparation: Detailed instructions are available in docs/data_preparation.md.
  • Pre-trained Models: Checkpoints are available via docs/model_zoo.md.
  • Documentation: Comprehensive guides for installation and usage are in docs/getting_started.md.

Highlighted Details

  • Achieves state-of-the-art results on ScanNet and S3DIS for semantic segmentation.
  • Supports multi-dataset training and Point Prompt Training (PPT).
  • Includes implementations for downstream tasks like semantic and instance segmentation.
  • Offers pre-trained checkpoints for indoor and outdoor datasets.

Maintenance & Community

The project is associated with Shanghai AI Lab, HKU, ZJU, and USTC. Updates include bug fixes for indoor pre-training and Structured3D data preprocessing. Further details on the roadmap or community channels are not explicitly provided in the README.

Licensing & Compatibility

The project is released under a permissive license, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The README notes that indoor pre-training configurations had bugs prior to a specific commit, requiring users to ensure they are using the fixed versions. Similarly, Structured3D RGB-D data preprocessing had bugs, necessitating regeneration of processed data if older versions were used.

Health Check
Last commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.