Clip-Forge by AutodeskAILab

Text-to-shape generation research paper (CVPR 2022)

Created 3 years ago

399 stars

Top 72.4% on SourcePulse

Project Summary

CLIP-Forge addresses the challenge of generating 3D shapes from natural language descriptions, a task hindered by the scarcity of paired text-shape data. Targeting researchers and developers in 3D content creation and AI, it enables zero-shot text-to-shape generation by leveraging unlabeled shape datasets and pre-trained image-text models like CLIP.

How It Works

CLIP-Forge employs a two-stage training process. The first stage trains an autoencoder on an unlabeled shape dataset to learn shape representations. The second stage fine-tunes this autoencoder using CLIP's image-text embeddings, allowing it to generate shapes corresponding to textual prompts without direct text-shape supervision. This approach avoids costly inference-time optimization and supports generating multiple shapes per query.

Quick Start & Requirements

Install: Create and activate a conda environment (conda env create -f environment.yaml, conda activate clip_forge). Install PyTorch 1.7.1+ with CUDA 11.0 (conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0), OpenAI's CLIP (pip install git+https://github.com/openai/CLIP.git), and scikit-learn (pip install sklearn).
Data: Download and unzip pretrained experiments: wget https://clip-forge-pretrained.s3.us-west-2.amazonaws.com/exps.zip.
Prerequisites: Python 3.x, PyTorch 1.7.1+, CUDA 11.0+, an unlabeled shape dataset (e.g., ShapeNet).
Docs: Paper

Highlighted Details

Zero-shot generalization to unseen text prompts.
Supports both voxel and pointcloud shape representations.
Generates multiple shapes for a single text query.
Offers quantitative and qualitative evaluations against baselines.

Maintenance & Community

The project is associated with Autodesk AI Lab. The README mentions ongoing work on pointcloud code and pretrained models.

Licensing & Compatibility

The repository does not explicitly state a license. The code is provided for research purposes, and citation is requested.

Limitations & Caveats

The model is trained on ShapeNet, suggesting optimal performance with queries limited to its 13 categories. While the method is believed to scale with data, public 3D data availability is noted as a limitation. Pointcloud code is described as "semi done."

Clip-Forge by AutodeskAILab

Explore Similar Projects

MAGIC by yxuansu

diffusion-self-distillation by primecai

Fantasia3D by Gorilla-Lab-SCUT

MiniGPT-5 by eric-ai-lab

long_stable_diffusion by sharonzhou

TediGAN by IIGROUP

MeshAnythingV2 by buaacyw

cube by Roblox

Monkey by Yuliang-Liu

StyleGAN-nada by rinongal

GLIGEN by gligen

glide-text2im by openai