StyleCLIP  by orpatashnik

Text-driven StyleGAN imagery manipulation via CLIP models

created 4 years ago
4,106 stars

Top 12.2% on sourcepulse

GitHubView on GitHub
Project Summary

StyleCLIP enables text-driven manipulation of StyleGAN-generated imagery by leveraging CLIP's visual-language understanding. It offers three methods for users to edit images based on textual descriptions: latent vector optimization, a trained latent mapper, and global directions in StyleSpace, providing flexible control over image generation and modification.

How It Works

StyleCLIP integrates StyleGAN's generative capabilities with CLIP's text-image alignment. The latent vector optimization method uses a CLIP-based loss to adjust latent vectors according to text prompts. The latent mapper learns to infer text-guided latent vector residuals for faster, more stable edits. Global directions identify input-agnostic manipulations in StyleGAN's style space, allowing interactive, text-driven adjustments.

Quick Start & Requirements

  • Installation: Requires Anaconda, CLIP, and specific StyleGAN implementations (PyTorch or TensorFlow).
  • Dependencies:
    • CLIP: pip install ftfy regex tqdm gdown git+https://github.com/openai/CLIP.git
    • PyTorch methods: PyTorch 1.7.1, cudatoolkit=<CUDA_VERSION>
    • TensorFlow methods: TensorFlow 1.14 or 1.15 (tensorflow-gpu==1.14)
    • Pretrained StyleGAN2/StyleGAN2-ADA models are required.
    • Mapper method requires facial recognition network weights.
  • Setup: Can involve downloading large datasets and pre-trained models.
  • Resources: Preprocessing steps for global directions can take several hours.
  • Links: Paper, Replicate, Colab Notebooks

Highlighted Details

  • Supports three distinct manipulation techniques: latent optimization, latent mapper, and global directions.
  • Offers both interactive GUI and programmatic control.
  • Compatible with custom StyleGAN2 and StyleGAN2-ADA models and custom images.
  • Methods can be applied to both generated and real images (after inversion).

Maintenance & Community

  • Last update: October 2022 (added global direction support for PyTorch).
  • Primary author: Or Patashnik.
  • Citation available for academic use.

Licensing & Compatibility

  • The repository itself does not explicitly state a license in the README.
  • StyleGAN implementations used may have their own licenses.
  • CLIP is typically available under a permissive license.

Limitations & Caveats

  • The TensorFlow implementation for global directions requires older TF versions (1.14/1.15), potentially causing compatibility issues with modern environments.
  • Preprocessing for global directions can be time-consuming.
  • Editing real images requires an inversion step using external tools like e4e.
Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
35 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
1 more.

EditAnything by sail-sg

0.0%
3k
Image editing research paper using segmentation and diffusion
created 2 years ago
updated 5 months ago
Feedback? Help us improve.