StyleGAN-nada  by rinongal

Domain adaptation for image generators via text prompts

created 4 years ago
1,184 stars

Top 33.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

StyleGAN-NADA enables text-guided adaptation of pre-trained image generators to new domains without requiring any images from the target domain. This is achieved by leveraging CLIP's semantic understanding to align generated images with textual descriptions, allowing for novel style and shape transformations. The primary audience is researchers and practitioners working with generative models who need to adapt them to specific styles or concepts efficiently.

How It Works

The method employs two generators, one held constant and the other trained. Training involves minimizing the CLIP-based distance between the generated images of both generators, while simultaneously enforcing that the direction between their CLIP embeddings aligns with a specified textual direction. This approach allows for "blind" domain adaptation, relying solely on semantic guidance from CLIP.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies via conda and pip.
  • Prerequisites: Anaconda, PyTorch 1.7.1, CUDA toolkit, CLIP (from OpenAI's GitHub), pre-trained StyleGAN2 generator.
  • Usage: A Colab notebook is provided for interactive use. Command-line training is available via python train.py with various arguments for customization.
  • Demo: HuggingFace Spaces demo available.
  • Docs: Project website and Colab notebook offer detailed guidance.

Highlighted Details

  • Adapts generators to new domains using only text prompts.
  • Supports adaptation from image styles via --style_img_dir.
  • Compatible with StyleGAN3 and StyleGAN-XL models with specific flags.
  • Enables cross-domain interpolation videos and out-of-domain editing of real images using inversion techniques like ReStyle.

Maintenance & Community

The project is associated with SIGGRAPH 2022. Updates include HuggingFace Spaces demo, StyleGAN-XL support, and Replicate.ai integration. Links to a Google Drive with pre-trained models are provided.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, it relies on CLIP (MIT License) and StyleGAN2 (MIT License). Compatibility for commercial use or closed-source linking would require clarification of the specific license terms for StyleGAN-NADA itself.

Limitations & Caveats

StyleGAN3/XL fine-tuning may exhibit grid artifacts, and these models do not currently support layer freezing. Smaller GPUs might encounter memory limitations with the Docker UI. The README mentions that some edits might work better with specific ReStyle encoders (pSp vs. e4e).

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
1 more.

EditAnything by sail-sg

0.0%
3k
Image editing research paper using segmentation and diffusion
created 2 years ago
updated 5 months ago
Feedback? Help us improve.