StyleCLIP  by orpatashnik

Text-driven StyleGAN imagery manipulation via CLIP models

Created 4 years ago
4,115 stars

Top 12.0% on SourcePulse

GitHubView on GitHub
Project Summary

StyleCLIP enables text-driven manipulation of StyleGAN-generated imagery by leveraging CLIP's visual-language understanding. It offers three methods for users to edit images based on textual descriptions: latent vector optimization, a trained latent mapper, and global directions in StyleSpace, providing flexible control over image generation and modification.

How It Works

StyleCLIP integrates StyleGAN's generative capabilities with CLIP's text-image alignment. The latent vector optimization method uses a CLIP-based loss to adjust latent vectors according to text prompts. The latent mapper learns to infer text-guided latent vector residuals for faster, more stable edits. Global directions identify input-agnostic manipulations in StyleGAN's style space, allowing interactive, text-driven adjustments.

Quick Start & Requirements

  • Installation: Requires Anaconda, CLIP, and specific StyleGAN implementations (PyTorch or TensorFlow).
  • Dependencies:
    • CLIP: pip install ftfy regex tqdm gdown git+https://github.com/openai/CLIP.git
    • PyTorch methods: PyTorch 1.7.1, cudatoolkit=<CUDA_VERSION>
    • TensorFlow methods: TensorFlow 1.14 or 1.15 (tensorflow-gpu==1.14)
    • Pretrained StyleGAN2/StyleGAN2-ADA models are required.
    • Mapper method requires facial recognition network weights.
  • Setup: Can involve downloading large datasets and pre-trained models.
  • Resources: Preprocessing steps for global directions can take several hours.
  • Links: Paper, Replicate, Colab Notebooks

Highlighted Details

  • Supports three distinct manipulation techniques: latent optimization, latent mapper, and global directions.
  • Offers both interactive GUI and programmatic control.
  • Compatible with custom StyleGAN2 and StyleGAN2-ADA models and custom images.
  • Methods can be applied to both generated and real images (after inversion).

Maintenance & Community

  • Last update: October 2022 (added global direction support for PyTorch).
  • Primary author: Or Patashnik.
  • Citation available for academic use.

Licensing & Compatibility

  • The repository itself does not explicitly state a license in the README.
  • StyleGAN implementations used may have their own licenses.
  • CLIP is typically available under a permissive license.

Limitations & Caveats

  • The TensorFlow implementation for global directions requires older TF versions (1.14/1.15), potentially causing compatibility issues with modern environments.
  • Preprocessing for global directions can be time-consuming.
  • Editing real images requires an inversion step using external tools like e4e.
Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Shengjia Zhao Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
7 more.

glide-text2im by openai

0.1%
4k
Text-conditional image synthesis model from research paper
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.