Kandinsky-3  by ai-forever

Text-to-image diffusion model for multifunctional generative tasks

created 1 year ago
380 stars

Top 76.1% on sourcepulse

GitHubView on GitHub
Project Summary

Kandinsky 3.1 is a text-to-image diffusion model designed for high-quality, realistic image generation with enhanced features. It targets researchers and power users seeking advanced control and efficiency in AI-driven visual content creation, offering improvements over its predecessor, Kandinsky 3.0.

How It Works

Kandinsky 3.1 builds upon a latent diffusion architecture, incorporating a Flan-UL2 text encoder and a large U-Net. A key innovation is Kandinsky Flash, a distilled model using Adversarial Diffusion Distillation on latents for significantly faster inference (4 steps) without quality degradation. It also features prompt beautification via an LLM (Intel's neural-chat-7b-v3-1) and integrates IP-Adapter and ControlNet for image-conditional generation.

Quick Start & Requirements

  • Install via pip install -r requirements.txt after creating a conda environment.
  • Requires CUDA 11.1+ and PyTorch 1.10.1.
  • Example usage provided in ./examples Jupyter notebooks.
  • Official HuggingFace repository and project page links are available.

Highlighted Details

  • Kandinsky Flash offers 4-step inference, 3x faster than the base model.
  • Integrates prompt beautification using an LLM for improved text-to-image results.
  • Supports inpainting, image fusion, and image variations.
  • IP-Adapter and ControlNet enable image-conditional generation.

Maintenance & Community

The project is actively developed by a team including Vladimir Arkhipkin, Anastasia Maltseva, Andrei Filatov, and Igor Pavlov. Links to HuggingFace and a Telegram bot are provided for community engagement.

Licensing & Compatibility

The model weights are released under a permissive license, allowing for commercial use and integration into closed-source applications.

Limitations & Caveats

The initial installation instructions specify PyTorch 1.10.1+cu111, which is an older version and may require careful dependency management for compatibility with newer CUDA toolkits.

Health Check
Last commit

6 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
19 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers) and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
created 2 years ago
updated 1 year ago
Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
2 more.

glide-text2im by openai

0.1%
4k
Text-conditional image synthesis model from research paper
created 3 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Feedback? Help us improve.