GPT4Point  by Pointcept

3D multi-modality model aligns point clouds with language

created 2 years ago
410 stars

Top 72.3% on sourcepulse

GitHubView on GitHub
Project Summary

GPT4Point offers a unified framework for 3D point cloud and language understanding and generation, targeting researchers and developers in 3D computer vision and multimodal AI. It enables tasks like 3D captioning and controlled 3D generation, leveraging a novel dataset annotation engine and benchmark.

How It Works

GPT4Point integrates a 3D multimodal large language model (MLLM) for point-text tasks. It aligns 3D point cloud data with language representations, facilitating a range of downstream applications. The framework also introduces Pyramid-XL, an automated annotation engine for creating large-scale point-language datasets, and a dedicated object-level benchmark for robust evaluation.

Quick Start & Requirements

  • Install: pip install salesforce-lavis or clone and pip install -e . for development.
  • Prerequisites: Python 3.8, PyTorch. Requires downloading Cap3D point cloud data and annotations.
  • Resources: Training involves multi-GPU distributed execution.
  • Links: Project Page, Cap3D Dataset, Objaverse-XL Download

Highlighted Details

  • CVPR'24 Highlight paper (2.84% acceptance rate).
  • Unified framework for point-language understanding and generation.
  • Introduces Pyramid-XL, an automated point-language dataset annotation engine.
  • Establishes a novel object-level point cloud benchmark with comprehensive metrics.

Maintenance & Community

  • Project is under active development with recent updates (April 2024) focusing on evaluation functionality.
  • Key components like the dataset and initial training/evaluation code have been released.
  • Related works include Point-Bind, Point-LLM, 3D-LLM, and PointLLM.

Licensing & Compatibility

  • Licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
  • Non-commercial use restriction applies.

Limitations & Caveats

The training section requires modification as of the latest update. The release of the Pyramid-XL dataset and engine, along with additional evaluation and training code, is still pending.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
23 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.