Text-to-image research paper for stylized generation via visual style prompting
Top 67.0% on sourcepulse
This repository provides the official PyTorch implementation for "Visual Style Prompting with Swapping Self-Attention," a method for text-to-image generation that allows users to control image style without fine-tuning diffusion models. It targets researchers and developers in AI image generation seeking consistent style transfer and reduced content leakage.
How It Works
The core innovation lies in a training-free approach that manipulates self-attention layers during the diffusion model's denoising process. Specifically, it swaps the key and value components from reference style features with the query from original features in the later self-attention layers. This technique effectively injects the desired visual style while preserving the content specified by the text prompt, avoiding the need for costly model fine-tuning.
Quick Start & Requirements
pip install --upgrade diffusers accelerate transformers einops kornia gradio triton xformers==0.0.16
Highlighted Details
Maintenance & Community
The project is developed by NAVER AI Lab and Yonsei University. Links to community resources are not explicitly provided in the README.
Licensing & Compatibility
Licensed under the Apache License, Version 2.0. This license is permissive and generally compatible with commercial use and closed-source linking.
Limitations & Caveats
The implementation requires specific versions of PyTorch and xformers. While the README mentions "color calibration to use a real image as reference" as a to-do, the vsp_real_script.py
script appears to implement this functionality.
1 month ago
Inactive