InstantStyle by instantX-research

Framework for style-preserving text-to-image generation

Created 2 years ago

2,002 stars

Top 21.9% on SourcePulse

Project Summary

InstantStyle is a framework for achieving style-preserving text-to-image generation by disentangling style and content from reference images. It targets researchers and developers working with diffusion models who need to control stylistic elements in generated outputs, offering a method to apply specific styles without altering content or spatial layout.

How It Works

InstantStyle leverages CLIP's global features to decouple style and content. It achieves this by subtracting text-based content features from image features, effectively isolating style. The framework then injects this style information into specific attention layers within the diffusion model's architecture, identified empirically as crucial for capturing style (e.g., up_blocks.0.attentions.1) and spatial layout (e.g., down_blocks.2.attentions.1). This targeted injection aims to preserve content while effectively transferring style.

Quick Start & Requirements

Install: Clone the repository and download IP-Adapter checkpoints.
Prerequisites: Python, PyTorch, Hugging Face diffusers (>=0.28.0.dev0), accelerate, hidiffusion. Requires a CUDA-enabled GPU.
Resources: VRAM requirements depend on the model size; SDXL examples suggest at least 10GB per GPU for distributed inference.
Demos: Online demos available on Hugging Face Spaces and ModelScope. Integration examples for diffusers, sd-webui-controlnet, ComfyUI, and AnyV2V are provided.

Highlighted Details

Native integration with Hugging Face diffusers library simplifies usage.
Supports fine-grained control over style injection via set_ip_adapter_scale() for specific transformer blocks.
Enables high-resolution generation through integration with HiDiffusion.
Offers experimental distributed inference for multi-GPU setups.
Supports multiple IP-Adapter images with masks for precise layout control.

Maintenance & Community

The project is actively developed by the InstantX Team, with recent updates in July 2024. Links to Hugging Face and ModelScope demos are provided. Contact information for inquiries is available.

Licensing & Compatibility

The pretrained checkpoints follow the license of IP-Adapter. Users are permitted to create images but must comply with local laws and use the tool responsibly.

Limitations & Caveats

The experimental SD1.5 version is noted as having weaker perception of style information. The project relies heavily on IP-Adapter, and its performance is tied to the underlying IP-Adapter checkpoints.

InstantStyle by instantX-research

Explore Similar Projects

ComfyUI-OmniGen by 1038lab

B-LoRA by yardenfren1996

cross-image-attention by garibida

StyleKeeper by naver-ai

f-lite by fal-ai

StyleShot by open-mmlab

clip-guided-diffusion by afiaka87

rich-text-to-image by songweige

comfyui-tooling-nodes by Acly

StyleGAN-nada by rinongal

Qwen-Image by QwenLM

sygil-webui by Sygil-Dev