Clip-Forge  by AutodeskAILab

Text-to-shape generation research paper (CVPR 2022)

created 3 years ago
394 stars

Top 74.2% on sourcepulse

GitHubView on GitHub
Project Summary

CLIP-Forge addresses the challenge of generating 3D shapes from natural language descriptions, a task hindered by the scarcity of paired text-shape data. Targeting researchers and developers in 3D content creation and AI, it enables zero-shot text-to-shape generation by leveraging unlabeled shape datasets and pre-trained image-text models like CLIP.

How It Works

CLIP-Forge employs a two-stage training process. The first stage trains an autoencoder on an unlabeled shape dataset to learn shape representations. The second stage fine-tunes this autoencoder using CLIP's image-text embeddings, allowing it to generate shapes corresponding to textual prompts without direct text-shape supervision. This approach avoids costly inference-time optimization and supports generating multiple shapes per query.

Quick Start & Requirements

  • Install: Create and activate a conda environment (conda env create -f environment.yaml, conda activate clip_forge). Install PyTorch 1.7.1+ with CUDA 11.0 (conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0), OpenAI's CLIP (pip install git+https://github.com/openai/CLIP.git), and scikit-learn (pip install sklearn).
  • Data: Download and unzip pretrained experiments: wget https://clip-forge-pretrained.s3.us-west-2.amazonaws.com/exps.zip.
  • Prerequisites: Python 3.x, PyTorch 1.7.1+, CUDA 11.0+, an unlabeled shape dataset (e.g., ShapeNet).
  • Docs: Paper

Highlighted Details

  • Zero-shot generalization to unseen text prompts.
  • Supports both voxel and pointcloud shape representations.
  • Generates multiple shapes for a single text query.
  • Offers quantitative and qualitative evaluations against baselines.

Maintenance & Community

The project is associated with Autodesk AI Lab. The README mentions ongoing work on pointcloud code and pretrained models.

Licensing & Compatibility

The repository does not explicitly state a license. The code is provided for research purposes, and citation is requested.

Limitations & Caveats

The model is trained on ShapeNet, suggesting optimal performance with queries limited to its 13 categories. While the method is believed to scale with data, public 3D data availability is noted as a limitation. Pointcloud code is described as "semi done."

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.