Kandinsky-2  by ai-forever

Multilingual text-to-image latent diffusion model

created 2 years ago
2,806 stars

Top 17.3% on sourcepulse

GitHubView on GitHub
Project Summary

Kandinsky 2.2 is a multilingual text-to-image diffusion model offering enhanced aesthetic quality and text comprehension through a new CLIP-ViT-G image encoder and ControlNet support for guided image generation. It targets researchers and developers seeking advanced text-to-image capabilities with fine-grained control.

How It Works

Kandinsky 2.2 employs a latent diffusion architecture with a powerful CLIP-ViT-G image encoder (1.8B parameters) and a U-Net diffusion model (1.22B parameters). It utilizes a diffusion image prior to map text embeddings to image embeddings, enabling text-to-image and image-to-image generation. ControlNet integration allows for precise control over image generation using additional conditions like depth maps.

Quick Start & Requirements

  • Install via pip: pip install "git+https://github.com/ai-forever/Kandinsky-2.git"
  • Requires CUDA-enabled GPU.
  • Example usage and notebooks are available in the ./notebooks folder.
  • Official Demo: fusionbrain.ai

Highlighted Details

  • Supports text-to-image, image-to-image, and inpainting.
  • Version 2.2 introduces CLIP-ViT-G for improved aesthetics and text understanding.
  • ControlNet support enables precise image generation control.
  • Multilingual capabilities via XLM-Roberta-Large-Vit-L-14 text encoder.

Maintenance & Community

  • Developed by ai-forever.
  • Key contributors include Arseniy Shakhmatov, Anton Razzhigaev, and Aleksandr Nikolich.
  • Links to author GitHubs and blogs are provided.

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The specific license is not mentioned, which may impact commercial adoption.
  • Flash attention is noted as use_flash_attention=False for 2.1, suggesting potential performance optimizations might be available or require specific configuration.
Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
24 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.