Large Language 3D Assistant for visual, textual interactions in 3D environments
Top 90.0% on sourcepulse
LL3DA is a Large Language 3D Assistant designed for omni-3D understanding, reasoning, and planning in complex 3D environments. It targets researchers and developers working with 3D vision-language models, offering direct point cloud input processing to overcome the limitations of 2D feature projection methods.
How It Works
LL3DA directly processes point cloud data, a permutation-invariant 3D representation, to comprehend and respond to textual instructions and visual prompts. This approach avoids the computational overhead and performance degradation associated with projecting 2D features into 3D space, enabling more accurate understanding and disambiguation in cluttered scenes.
Quick Start & Requirements
pointnet2
and accelerated giou
from source.torch=1.13.1+cu116
, transformers>=4.37.0
, h5py
, scipy
, cython
, plyfile
, trimesh>=2.35.39,<2.35.40
, networkx>=2.2,<2.3
.opt-1.3b
) need to be downloaded.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
1 day