Text-to-shape generation research paper (CVPR 2022)
Top 74.2% on sourcepulse
CLIP-Forge addresses the challenge of generating 3D shapes from natural language descriptions, a task hindered by the scarcity of paired text-shape data. Targeting researchers and developers in 3D content creation and AI, it enables zero-shot text-to-shape generation by leveraging unlabeled shape datasets and pre-trained image-text models like CLIP.
How It Works
CLIP-Forge employs a two-stage training process. The first stage trains an autoencoder on an unlabeled shape dataset to learn shape representations. The second stage fine-tunes this autoencoder using CLIP's image-text embeddings, allowing it to generate shapes corresponding to textual prompts without direct text-shape supervision. This approach avoids costly inference-time optimization and supports generating multiple shapes per query.
Quick Start & Requirements
conda env create -f environment.yaml
, conda activate clip_forge
). Install PyTorch 1.7.1+ with CUDA 11.0 (conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
), OpenAI's CLIP (pip install git+https://github.com/openai/CLIP.git
), and scikit-learn (pip install sklearn
).wget https://clip-forge-pretrained.s3.us-west-2.amazonaws.com/exps.zip
.Highlighted Details
Maintenance & Community
The project is associated with Autodesk AI Lab. The README mentions ongoing work on pointcloud code and pretrained models.
Licensing & Compatibility
The repository does not explicitly state a license. The code is provided for research purposes, and citation is requested.
Limitations & Caveats
The model is trained on ShapeNet, suggesting optimal performance with queries limited to its 13 categories. While the method is believed to scale with data, public 3D data availability is noted as a limitation. Pointcloud code is described as "semi done."
2 years ago
Inactive