CAD-MLLM by CAD-MLLM

Unifying multimodal inputs for CAD generation with MLLMs

Created 1 year ago

264 stars

Top 96.5% on SourcePulse

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> CAD-MLLM addresses the challenge of unifying multimodality-conditioned Computer-Aided Design (CAD) generation by leveraging Multimodal Large Language Models (MLLMs). It targets researchers and engineers in the CAD and AI fields, providing a novel framework to generate complex CAD models from diverse inputs like text and images, aiming to streamline design processes.

How It Works

The project integrates MLLMs to enable conditional CAD generation, allowing for more intuitive and flexible design workflows. It builds upon the DeepCAD framework for robust data preprocessing, including conversion to STEP formats, point cloud sampling, and image rendering. This approach aims to unify various conditioning modalities for a more comprehensive CAD generation system.

Quick Start & Requirements

Installation: Requires Git, Conda, Python 3.8, and pythonocc-core=7.8.1. Setup involves initializing submodules, creating a Conda environment, installing dependencies from ./3rd_party/DeepCAD/requirements.txt, and installing pythonocc-core.
Dataset: The Omni-CAD dataset (model descriptions and text captions) must be downloaded from Hugging Face.
Data Preprocessing: A multi-step process includes exporting CAD data to STEP format, sampling point clouds, and rendering images using tools like PythonOCC, Blender, Mitsuba3, or Open3D.
Links: Evaluation code and guidance are available at CAD-MLLM-metrics. A project page is referenced for demonstrations.

Highlighted Details

Introduces novel evaluation metrics: Segment Error (SegE), Dangling Edge Length (DangEL), Self Intersection Ratio (SIR), and Flux Enclosure Error (FluxEE).
Released the comprehensive Omni-CAD dataset for multimodality-conditioned CAD generation research.
Provides scripts for converting CAD data to STEP, sampling point clouds, and rendering images.

Maintenance & Community

The project is led by researchers from ShanghaiTech University, Transcengram, DeepSeek AI, and the University of Hong Kong. Acknowledgements are made to the DeepCAD project. Key components like inference and training code are still pending release according to the project's to-do list. No community channels (e.g., Discord, Slack) are explicitly listed.

Licensing & Compatibility

The provided README does not specify a software license. This absence creates ambiguity regarding usage rights, commercial application, and derivative works.

Limitations & Caveats

The inference and training code are not yet publicly available, limiting immediate practical application for model deployment or further development. The project appears to be in an active development phase, with core functionalities still to be released.

CAD-MLLM by CAD-MLLM

Explore Similar Projects

holi-spatial by Visionary-Laboratory

InternVL-U by OpenGVLab

cad-skill by flowful-ai

deepgen by deepgenteam

Awesome-CAD by bertjiazheng

cad3dify by neka-nat

Hunyuan3D-Omni by Tencent-Hunyuan

PhysX-Anything by ziangcao0312

cube by Roblox

image-blaster by neilsonnn

Bagel by ByteDance-Seed

shap-e by openai