StyleCLIP by orpatashnik

Text-driven StyleGAN imagery manipulation via CLIP models

Created 4 years ago

4,125 stars

Top 11.8% on SourcePulse

View on GitHub

11 Experts Love This Project

Luis Capelo

Cofounder of Lightning AI

Alexander Borzunov

Research Scientist at OpenAI

Ben Firshman

Cofounder of Replicate

Chenlin Meng

Cofounder of Pika

and 7 more!

Project Summary

StyleCLIP enables text-driven manipulation of StyleGAN-generated imagery by leveraging CLIP's visual-language understanding. It offers three methods for users to edit images based on textual descriptions: latent vector optimization, a trained latent mapper, and global directions in StyleSpace, providing flexible control over image generation and modification.

How It Works

StyleCLIP integrates StyleGAN's generative capabilities with CLIP's text-image alignment. The latent vector optimization method uses a CLIP-based loss to adjust latent vectors according to text prompts. The latent mapper learns to infer text-guided latent vector residuals for faster, more stable edits. Global directions identify input-agnostic manipulations in StyleGAN's style space, allowing interactive, text-driven adjustments.

Quick Start & Requirements

Installation: Requires Anaconda, CLIP, and specific StyleGAN implementations (PyTorch or TensorFlow).
Dependencies:
- CLIP: pip install ftfy regex tqdm gdown git+https://github.com/openai/CLIP.git
- PyTorch methods: PyTorch 1.7.1, cudatoolkit=<CUDA_VERSION>
- TensorFlow methods: TensorFlow 1.14 or 1.15 (tensorflow-gpu==1.14)
- Pretrained StyleGAN2/StyleGAN2-ADA models are required.
- Mapper method requires facial recognition network weights.
Setup: Can involve downloading large datasets and pre-trained models.
Resources: Preprocessing steps for global directions can take several hours.
Links: Paper, Replicate, Colab Notebooks

Highlighted Details

Supports three distinct manipulation techniques: latent optimization, latent mapper, and global directions.
Offers both interactive GUI and programmatic control.
Compatible with custom StyleGAN2 and StyleGAN2-ADA models and custom images.
Methods can be applied to both generated and real images (after inversion).

Maintenance & Community

Last update: October 2022 (added global direction support for PyTorch).
Primary author: Or Patashnik.
Citation available for academic use.

Licensing & Compatibility

The repository itself does not explicitly state a license in the README.
StyleGAN implementations used may have their own licenses.
CLIP is typically available under a permissive license.

Limitations & Caveats

The TensorFlow implementation for global directions requires older TF versions (1.14/1.15), potentially causing compatibility issues with modern environments.
Preprocessing for global directions can be time-consuming.
Editing real images requires an inversion step using external tools like e4e.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days