InstantStyle  by instantX-research

Framework for style-preserving text-to-image generation

created 1 year ago
1,954 stars

Top 22.9% on sourcepulse

GitHubView on GitHub
Project Summary

InstantStyle is a framework for achieving style-preserving text-to-image generation by disentangling style and content from reference images. It targets researchers and developers working with diffusion models who need to control stylistic elements in generated outputs, offering a method to apply specific styles without altering content or spatial layout.

How It Works

InstantStyle leverages CLIP's global features to decouple style and content. It achieves this by subtracting text-based content features from image features, effectively isolating style. The framework then injects this style information into specific attention layers within the diffusion model's architecture, identified empirically as crucial for capturing style (e.g., up_blocks.0.attentions.1) and spatial layout (e.g., down_blocks.2.attentions.1). This targeted injection aims to preserve content while effectively transferring style.

Quick Start & Requirements

  • Install: Clone the repository and download IP-Adapter checkpoints.
  • Prerequisites: Python, PyTorch, Hugging Face diffusers (>=0.28.0.dev0), accelerate, hidiffusion. Requires a CUDA-enabled GPU.
  • Resources: VRAM requirements depend on the model size; SDXL examples suggest at least 10GB per GPU for distributed inference.
  • Demos: Online demos available on Hugging Face Spaces and ModelScope. Integration examples for diffusers, sd-webui-controlnet, ComfyUI, and AnyV2V are provided.

Highlighted Details

  • Native integration with Hugging Face diffusers library simplifies usage.
  • Supports fine-grained control over style injection via set_ip_adapter_scale() for specific transformer blocks.
  • Enables high-resolution generation through integration with HiDiffusion.
  • Offers experimental distributed inference for multi-GPU setups.
  • Supports multiple IP-Adapter images with masks for precise layout control.

Maintenance & Community

The project is actively developed by the InstantX Team, with recent updates in July 2024. Links to Hugging Face and ModelScope demos are provided. Contact information for inquiries is available.

Licensing & Compatibility

The pretrained checkpoints follow the license of IP-Adapter. Users are permitted to create images but must comply with local laws and use the tool responsibly.

Limitations & Caveats

The experimental SD1.5 version is noted as having weaker perception of style information. The project relies heavily on IP-Adapter, and its performance is tied to the underlying IP-Adapter checkpoints.

Health Check
Last commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
55 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.