GPT4Point by Pointcept

3D multi-modality model aligns point clouds with language

Created 2 years ago

441 stars

Top 67.1% on SourcePulse

Project Summary

GPT4Point offers a unified framework for 3D point cloud and language understanding and generation, targeting researchers and developers in 3D computer vision and multimodal AI. It enables tasks like 3D captioning and controlled 3D generation, leveraging a novel dataset annotation engine and benchmark.

How It Works

GPT4Point integrates a 3D multimodal large language model (MLLM) for point-text tasks. It aligns 3D point cloud data with language representations, facilitating a range of downstream applications. The framework also introduces Pyramid-XL, an automated annotation engine for creating large-scale point-language datasets, and a dedicated object-level benchmark for robust evaluation.

Quick Start & Requirements

Install: pip install salesforce-lavis or clone and pip install -e . for development.
Prerequisites: Python 3.8, PyTorch. Requires downloading Cap3D point cloud data and annotations.
Resources: Training involves multi-GPU distributed execution.
Links: Project Page, Cap3D Dataset, Objaverse-XL Download

Highlighted Details

CVPR'24 Highlight paper (2.84% acceptance rate).
Unified framework for point-language understanding and generation.
Introduces Pyramid-XL, an automated point-language dataset annotation engine.
Establishes a novel object-level point cloud benchmark with comprehensive metrics.

Maintenance & Community

Project is under active development with recent updates (April 2024) focusing on evaluation functionality.
Key components like the dataset and initial training/evaluation code have been released.
Related works include Point-Bind, Point-LLM, 3D-LLM, and PointLLM.

Licensing & Compatibility

Licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Non-commercial use restriction applies.

Limitations & Caveats

The training section requires modification as of the latest update. The release of the Pyramid-XL dataset and engine, along with additional evaluation and training code, is still pending.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days