PointLLM  by InternRobotics

Multimodal LLM for understanding point clouds

created 1 year ago
848 stars

Top 43.0% on sourcepulse

GitHubView on GitHub
Project Summary

PointLLM is a multi-modal large language model designed to interpret colored point clouds, enabling understanding of object types, geometry, and appearance. It targets researchers and developers working with 3D data, offering a novel approach to 3D scene understanding and interaction through natural language.

How It Works

PointLLM integrates a point cloud encoder with a large language model (LLM) backbone. The point encoder extracts features from input point clouds, projecting them into the LLM's latent space. The LLM then processes sequences of these point tokens alongside text tokens to generate responses, allowing for nuanced understanding and dialogue about 3D objects.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies using pip install -e .. Training requires ninja and flash-attn.
  • Data: Requires downloading ~77GB of Objaverse colored point cloud data and instruction-following datasets.
  • Environment: Tested on Ubuntu 20.04 with NVIDIA drivers, CUDA 11.7, Python 3.10, PyTorch 2.0.1, and Transformers 4.28.0.
  • Hardware: GPU memory requirements range from 14GB (PointLLM-7B, float16) to 52GB (PointLLM-13B, float32).
  • Links: Online Demo (currently offline), Paper, Checkpoints.

Highlighted Details

  • ECCV 2024 Best Paper Candidate.
  • Trained on 730K instruction-following point-text pairs.
  • Supports open-vocabulary classification and 3D object captioning.
  • Evaluation includes traditional metrics and GPT-4 based assessment.

Maintenance & Community

The project is actively maintained, with recent updates including a camera-ready paper version and ongoing research recruitment. Community contributions for supporting new LLMs are welcomed.

Licensing & Compatibility

Licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). This license restricts commercial use and requires derivative works to be shared under the same license.

Limitations & Caveats

The online demo is currently offline. The CC BY-NC-SA 4.0 license prohibits commercial applications. The model is trained on specific point cloud formats and sampling (8192 points), requiring potential preprocessing for other datasets.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
72 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.