ShapeLLM-Omni by JAMESYJL

Multimodal LLM for 3D content

Created 7 months ago

530 stars

Top 59.7% on SourcePulse

Project Summary

ShapeLLM-Omni is a native multimodal large language model designed for 3D generation and understanding tasks. It targets researchers and developers working with 3D data, offering capabilities for creating and interpreting 3D assets through natural language prompts. The project aims to bridge the gap between text-based instructions and 3D content creation.

How It Works

ShapeLLM-Omni integrates a vision-language model with 3D representations, likely leveraging a VQ-VAE for discretizing 3D shapes into tokens. This approach allows the LLM to process and generate 3D data in a manner analogous to how it handles text and images, enabling direct manipulation and creation of 3D objects via textual commands.

Quick Start & Requirements

Install via pip install -r requirements.txt.
Setup requires following TRELLIS and QWEN2.5-VL installation guides.
Inference can be visualized using Gradio UI via python app.py.
Official demo available at: https://huggingface.co/spaces/yejunliang23/ShapLLM-Omni
Project page for more examples: https://jamesyjl.github.io/ShapeLLM/

Highlighted Details

Released pretrained weights for ShapeLLM-Omni (7B) and 3DVQVAE.
Released 50k high-quality 3D edited data pairs.
Code is based on LLaMA-Factory, TRELLIS, PointLLM, Qwen2.5-VL, LLaMA-Mesh, and DeepMesh.

Maintenance & Community

The project is associated with Tsinghua University, ShengShu, and Peking University. Key contributors are listed as Junliang Ye, Zhengyi Wang, and Ruowen Zhao.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is marked with "Todo" items including the release of the entire 3D-Alpaca dataset, training code, and model weights with multi-turn dialogue and 3D editing capabilities, indicating it is still under active development and may not be feature-complete.

ShapeLLM-Omni by JAMESYJL

Explore Similar Projects

Cap3D by crockwell

Keye by Kwai-Keye

MVEdit by Lakonik

LLaVA-3D by ZCMax

GPT4RoI by jshilong

LayoutGPT by weixi-feng

PointLLM by InternRobotics

3D-LLM by UMass-Embodied-AGI

OpenLRM by 3DTopia

cambrian by cambrian-mllm

Bagel by ByteDance-Seed

shap-e by openai