StyleCLIP enables text-driven manipulation of StyleGAN-generated imagery by leveraging CLIP's visual-language understanding. It offers three methods for users to edit images based on textual descriptions: latent vector optimization, a trained latent mapper, and global directions in StyleSpace, providing flexible control over image generation and modification.
How It Works
StyleCLIP integrates StyleGAN's generative capabilities with CLIP's text-image alignment. The latent vector optimization method uses a CLIP-based loss to adjust latent vectors according to text prompts. The latent mapper learns to infer text-guided latent vector residuals for faster, more stable edits. Global directions identify input-agnostic manipulations in StyleGAN's style space, allowing interactive, text-driven adjustments.
Quick Start & Requirements
- Installation: Requires Anaconda, CLIP, and specific StyleGAN implementations (PyTorch or TensorFlow).
- Dependencies:
- CLIP:
pip install ftfy regex tqdm gdown git+https://github.com/openai/CLIP.git
- PyTorch methods: PyTorch 1.7.1,
cudatoolkit=<CUDA_VERSION>
- TensorFlow methods: TensorFlow 1.14 or 1.15 (
tensorflow-gpu==1.14
)
- Pretrained StyleGAN2/StyleGAN2-ADA models are required.
- Mapper method requires facial recognition network weights.
- Setup: Can involve downloading large datasets and pre-trained models.
- Resources: Preprocessing steps for global directions can take several hours.
- Links: Paper, Replicate, Colab Notebooks
Highlighted Details
- Supports three distinct manipulation techniques: latent optimization, latent mapper, and global directions.
- Offers both interactive GUI and programmatic control.
- Compatible with custom StyleGAN2 and StyleGAN2-ADA models and custom images.
- Methods can be applied to both generated and real images (after inversion).
Maintenance & Community
- Last update: October 2022 (added global direction support for PyTorch).
- Primary author: Or Patashnik.
- Citation available for academic use.
Licensing & Compatibility
- The repository itself does not explicitly state a license in the README.
- StyleGAN implementations used may have their own licenses.
- CLIP is typically available under a permissive license.
Limitations & Caveats
- The TensorFlow implementation for global directions requires older TF versions (1.14/1.15), potentially causing compatibility issues with modern environments.
- Preprocessing for global directions can be time-consuming.
- Editing real images requires an inversion step using external tools like e4e.