Multimodal LLM for 3D content
Top 66.4% on sourcepulse
ShapeLLM-Omni is a native multimodal large language model designed for 3D generation and understanding tasks. It targets researchers and developers working with 3D data, offering capabilities for creating and interpreting 3D assets through natural language prompts. The project aims to bridge the gap between text-based instructions and 3D content creation.
How It Works
ShapeLLM-Omni integrates a vision-language model with 3D representations, likely leveraging a VQ-VAE for discretizing 3D shapes into tokens. This approach allows the LLM to process and generate 3D data in a manner analogous to how it handles text and images, enabling direct manipulation and creation of 3D objects via textual commands.
Quick Start & Requirements
pip install -r requirements.txt
.python app.py
.Highlighted Details
Maintenance & Community
The project is associated with Tsinghua University, ShengShu, and Peking University. Key contributors are listed as Junliang Ye, Zhengyi Wang, and Ruowen Zhao.
Licensing & Compatibility
The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is marked with "Todo" items including the release of the entire 3D-Alpaca dataset, training code, and model weights with multi-turn dialogue and 3D editing capabilities, indicating it is still under active development and may not be feature-complete.
1 month ago
Inactive