Kandinsky-2  by ai-forever

Multilingual text-to-image latent diffusion model

Created 2 years ago
2,810 stars

Top 16.9% on SourcePulse

GitHubView on GitHub
Project Summary

Kandinsky 2.2 is a multilingual text-to-image diffusion model offering enhanced aesthetic quality and text comprehension through a new CLIP-ViT-G image encoder and ControlNet support for guided image generation. It targets researchers and developers seeking advanced text-to-image capabilities with fine-grained control.

How It Works

Kandinsky 2.2 employs a latent diffusion architecture with a powerful CLIP-ViT-G image encoder (1.8B parameters) and a U-Net diffusion model (1.22B parameters). It utilizes a diffusion image prior to map text embeddings to image embeddings, enabling text-to-image and image-to-image generation. ControlNet integration allows for precise control over image generation using additional conditions like depth maps.

Quick Start & Requirements

  • Install via pip: pip install "git+https://github.com/ai-forever/Kandinsky-2.git"
  • Requires CUDA-enabled GPU.
  • Example usage and notebooks are available in the ./notebooks folder.
  • Official Demo: fusionbrain.ai

Highlighted Details

  • Supports text-to-image, image-to-image, and inpainting.
  • Version 2.2 introduces CLIP-ViT-G for improved aesthetics and text understanding.
  • ControlNet support enables precise image generation control.
  • Multilingual capabilities via XLM-Roberta-Large-Vit-L-14 text encoder.

Maintenance & Community

  • Developed by ai-forever.
  • Key contributors include Arseniy Shakhmatov, Anton Razzhigaev, and Aleksandr Nikolich.
  • Links to author GitHubs and blogs are provided.

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The specific license is not mentioned, which may impact commercial adoption.
  • Flash attention is noted as use_flash_attention=False for 2.1, suggesting potential performance optimizations might be available or require specific configuration.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Robin Huang Robin Huang(Cofounder of Comfy Org), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
17 more.

stablediffusion by Stability-AI

0.1%
42k
Latent diffusion model for high-resolution image synthesis
Created 2 years ago
Updated 2 months ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
57 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.