ShapeLLM-Omni  by JAMESYJL

Multimodal LLM for 3D content

created 2 months ago
463 stars

Top 66.4% on sourcepulse

GitHubView on GitHub
Project Summary

ShapeLLM-Omni is a native multimodal large language model designed for 3D generation and understanding tasks. It targets researchers and developers working with 3D data, offering capabilities for creating and interpreting 3D assets through natural language prompts. The project aims to bridge the gap between text-based instructions and 3D content creation.

How It Works

ShapeLLM-Omni integrates a vision-language model with 3D representations, likely leveraging a VQ-VAE for discretizing 3D shapes into tokens. This approach allows the LLM to process and generate 3D data in a manner analogous to how it handles text and images, enabling direct manipulation and creation of 3D objects via textual commands.

Quick Start & Requirements

Highlighted Details

  • Released pretrained weights for ShapeLLM-Omni (7B) and 3DVQVAE.
  • Released 50k high-quality 3D edited data pairs.
  • Code is based on LLaMA-Factory, TRELLIS, PointLLM, Qwen2.5-VL, LLaMA-Mesh, and DeepMesh.

Maintenance & Community

The project is associated with Tsinghua University, ShengShu, and Peking University. Key contributors are listed as Junliang Ye, Zhengyi Wang, and Ruowen Zhao.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is marked with "Todo" items including the release of the entire 3D-Alpaca dataset, training code, and model weights with multi-turn dialogue and 3D editing capabilities, indicating it is still under active development and may not be feature-complete.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
465 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.