CAD-MLLM  by CAD-MLLM

Unifying multimodal inputs for CAD generation with MLLMs

Created 1 year ago
257 stars

Top 98.2% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> CAD-MLLM addresses the challenge of unifying multimodality-conditioned Computer-Aided Design (CAD) generation by leveraging Multimodal Large Language Models (MLLMs). It targets researchers and engineers in the CAD and AI fields, providing a novel framework to generate complex CAD models from diverse inputs like text and images, aiming to streamline design processes.

How It Works

The project integrates MLLMs to enable conditional CAD generation, allowing for more intuitive and flexible design workflows. It builds upon the DeepCAD framework for robust data preprocessing, including conversion to STEP formats, point cloud sampling, and image rendering. This approach aims to unify various conditioning modalities for a more comprehensive CAD generation system.

Quick Start & Requirements

  • Installation: Requires Git, Conda, Python 3.8, and pythonocc-core=7.8.1. Setup involves initializing submodules, creating a Conda environment, installing dependencies from ./3rd_party/DeepCAD/requirements.txt, and installing pythonocc-core.
  • Dataset: The Omni-CAD dataset (model descriptions and text captions) must be downloaded from Hugging Face.
  • Data Preprocessing: A multi-step process includes exporting CAD data to STEP format, sampling point clouds, and rendering images using tools like PythonOCC, Blender, Mitsuba3, or Open3D.
  • Links: Evaluation code and guidance are available at CAD-MLLM-metrics. A project page is referenced for demonstrations.

Highlighted Details

  • Introduces novel evaluation metrics: Segment Error (SegE), Dangling Edge Length (DangEL), Self Intersection Ratio (SIR), and Flux Enclosure Error (FluxEE).
  • Released the comprehensive Omni-CAD dataset for multimodality-conditioned CAD generation research.
  • Provides scripts for converting CAD data to STEP, sampling point clouds, and rendering images.

Maintenance & Community

The project is led by researchers from ShanghaiTech University, Transcengram, DeepSeek AI, and the University of Hong Kong. Acknowledgements are made to the DeepCAD project. Key components like inference and training code are still pending release according to the project's to-do list. No community channels (e.g., Discord, Slack) are explicitly listed.

Licensing & Compatibility

The provided README does not specify a software license. This absence creates ambiguity regarding usage rights, commercial application, and derivative works.

Limitations & Caveats

The inference and training code are not yet publicly available, limiting immediate practical application for model deployment or further development. The project appears to be in an active development phase, with core functionalities still to be released.

Health Check
Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Eric Zhang Eric Zhang(Founding Engineer at Modal), and
13 more.

flux by black-forest-labs

0.1%
25k
Inference code for FLUX image generation & editing models
Created 1 year ago
Updated 8 months ago
Feedback? Help us improve.