LL3DA by Open3DA

Large Language 3D Assistant for visual, textual interactions in 3D environments

Created 2 years ago

310 stars

Top 86.9% on SourcePulse

Project Summary

LL3DA is a Large Language 3D Assistant designed for omni-3D understanding, reasoning, and planning in complex 3D environments. It targets researchers and developers working with 3D vision-language models, offering direct point cloud input processing to overcome the limitations of 2D feature projection methods.

How It Works

LL3DA directly processes point cloud data, a permutation-invariant 3D representation, to comprehend and respond to textual instructions and visual prompts. This approach avoids the computational overhead and performance degradation associated with projecting 2D features into 3D space, enabling more accurate understanding and disambiguation in cluttered scenes.

Quick Start & Requirements

Install: Requires manual compilation of pointnet2 and accelerated giou from source.
Dependencies: Python 3.8.16, CUDA 11.6, torch=1.13.1+cu116, transformers>=4.37.0, h5py, scipy, cython, plyfile, trimesh>=2.35.39,<2.35.40, networkx>=2.2,<2.3.
Data: Requires ScanNet V2 dataset, ScanRefer, Nr3D, ScanQA, and 3D-LLM datasets. Pre-processed ScanNet data is available.
Weights: BERT embeddings and pre-trained LLM weights (e.g., opt-1.3b) need to be downloaded.
Setup Time: Significant time required for data preparation and dependency compilation.
Links: Project Page, Arxiv Paper, YouTube, HuggingFace Demo (WIP).

Highlighted Details

Achieves state-of-the-art results on 3D Dense Captioning and 3D Question Answering benchmarks.
Supports various decoder-only LLMs including OPT, GPT-2, Llama-2, and Qwen.
Provides training and evaluation scripts for generalist models and task-specific fine-tuning (ScanQA, ScanRefer, Nr3D, OVDet).
Code released for training customized models.

Maintenance & Community

Code fully released March 2024. Accepted to CVPR 2024.
Pre-trained weights available on HuggingFace.

Licensing & Compatibility

MIT LICENSE. Permissive for commercial use and closed-source linking.

Limitations & Caveats

The released version has minor differences from the paper's implementation; specific scripts are provided to reproduce reported results.
A local demo interface is still under development.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

1 stars in the last 30 days

Explore Similar Projects

CE3D by Fangkang515

3D scene editor for interactive manipulation via LLM-driven chat

Created 1 year ago

Updated 7 months ago

SceneVerse by scene-verse

Scaling 3D vision-language learning for grounded scene understanding

Created 2 years ago

Updated 9 months ago

Starred by

Ajay Jain

Ajay Jain(Cofounder of Genmo).

Cap3D by crockwell

Research paper for scalable 3D captioning using pretrained models

Created 2 years ago

Updated 6 months ago

LLaVA-3D by ZCMax

LLM for 2D/3D vision-language tasks

Created 1 year ago

Updated 2 months ago

chat-with-nerf by sled-group

Chat with NeRF enables natural language interaction with NeRF models

Created 2 years ago

Updated 3 months ago

blender-mcp-vxai by VxASI

Blender tool for natural language 3D creation via MCP clients

Created 10 months ago

Updated 9 months ago

Point-Bind_Point-LLM by ZiyuGuo99

3D multi-modality model aligning point clouds with language models

Created 2 years ago

Updated 2 years ago

GPT4Point by Pointcept

3D multi-modality model aligns point clouds with language

Created 2 years ago

Updated 1 year ago

embodied-generalist by embodied-generalist

3D embodied generalist agent (research paper)

Created 2 years ago

Updated 8 months ago

PointLLM by InternRobotics

Multimodal LLM for understanding point clouds

Created 2 years ago

Updated 5 months ago

3D-LLM by UMass-Embodied-AGI

3D-LLM injects 3D data into large language models

Created 2 years ago

Updated 1 year ago

Starred by

Ishaan Jaffer

Ishaan Jaffer(Cofounder of LiteLLM),

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI), and

6 more.

Grounded-Segment-Anything by IDEA-Research

Framework for open-world visual tasks, combining multiple models

Created 2 years ago

Updated 1 year ago

Feedback? Help us improve.