Point-Bind_Point-LLM  by ZiyuGuo99

3D multi-modality model aligning point clouds with language models

created 2 years ago
442 stars

Top 68.9% on sourcepulse

GitHubView on GitHub
Project Summary

This project introduces Point-Bind and Point-LLM, enabling 3D point cloud understanding and generation through multi-modal alignment with large language models. It targets researchers and developers working with 3D data and LLMs, offering a novel approach to 3D reasoning and instruction following without requiring 3D-specific instruction datasets.

How It Works

Point-Bind establishes a joint embedding space for 3D point clouds, images, audio, and video. This unified representation allows Point-LLM, a fine-tuned LLaMA model, to perform 3D multi-modal reasoning. The approach leverages parameter-efficient fine-tuning and public vision-language data, making it efficient in terms of data and parameters.

Quick Start & Requirements

  • Install: Follow Install.md for environment setup and checkpoint acquisition.
  • Prerequisites: Requires LLaMA backbone weights (obtain via provided form), CUDA-enabled GPU.
  • Demo: Online demo available, integrated into ImageBind-LLM. Local demo hosting via python gradio_app.py --llama_dir /path/to/llama_model_weights.
  • Inference: Example scripts demo_text_3d.py, demo_audio_3d.py, and Point-LLM/demo.py are provided.

Highlighted Details

  • Achieves 76.3% (Point-BERT) and 78.0% (I2P-MAE) zero-shot classification accuracy on ModelNet40, outperforming prior methods.
  • Supports both English and Chinese for 3D instruction following.
  • Demonstrates efficient fine-tuning using parameter-efficient techniques and public vision-language data.
  • Enables reasoning across combined 3D and multi-modal inputs (e.g., point cloud + audio).

Maintenance & Community

  • Project paper available on arXiv.
  • Inference code released.
  • Contact emails provided for inquiries.

Licensing & Compatibility

  • License details are not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the project's license, which is crucial for commercial adoption. Obtaining LLaMA weights requires a specific form submission, and unofficial sources are cautioned against due to potential malicious code.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.