Point-Bind_Point-LLM by ZiyuGuo99

3D multi-modality model aligning point clouds with language models

Created 2 years ago

455 stars

Top 66.3% on SourcePulse

Project Summary

This project introduces Point-Bind and Point-LLM, enabling 3D point cloud understanding and generation through multi-modal alignment with large language models. It targets researchers and developers working with 3D data and LLMs, offering a novel approach to 3D reasoning and instruction following without requiring 3D-specific instruction datasets.

How It Works

Point-Bind establishes a joint embedding space for 3D point clouds, images, audio, and video. This unified representation allows Point-LLM, a fine-tuned LLaMA model, to perform 3D multi-modal reasoning. The approach leverages parameter-efficient fine-tuning and public vision-language data, making it efficient in terms of data and parameters.

Quick Start & Requirements

Install: Follow Install.md for environment setup and checkpoint acquisition.
Prerequisites: Requires LLaMA backbone weights (obtain via provided form), CUDA-enabled GPU.
Demo: Online demo available, integrated into ImageBind-LLM. Local demo hosting via python gradio_app.py --llama_dir /path/to/llama_model_weights.
Inference: Example scripts demo_text_3d.py, demo_audio_3d.py, and Point-LLM/demo.py are provided.

Highlighted Details

Achieves 76.3% (Point-BERT) and 78.0% (I2P-MAE) zero-shot classification accuracy on ModelNet40, outperforming prior methods.
Supports both English and Chinese for 3D instruction following.
Demonstrates efficient fine-tuning using parameter-efficient techniques and public vision-language data.
Enables reasoning across combined 3D and multi-modal inputs (e.g., point cloud + audio).

Maintenance & Community

Project paper available on arXiv.
Inference code released.
Contact emails provided for inquiries.

Licensing & Compatibility

License details are not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the project's license, which is crucial for commercial adoption. Obtaining LLaMA weights requires a specific form submission, and unofficial sources are cautioned against due to potential malicious code.

Point-Bind_Point-LLM by ZiyuGuo99

Explore Similar Projects

SceneVerse by scene-verse

PonderV2 by OpenGVLab

LL3DA by Open3DA

PointCLIP_V2 by yangyangyang127

Cap3D by crockwell

LLaVA-3D by ZCMax

GPT4Point by Pointcept

PointLLM by InternRobotics

openscene by pengsongyou

3D-LLM by UMass-Embodied-AGI

Awesome-LLM-3D by ActiveVisionLab

shap-e by openai