Kandinsky-3  by ai-forever

Text-to-image diffusion model for multifunctional generative tasks

Created 1 year ago
384 stars

Top 74.4% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Kandinsky 3.1 is a text-to-image diffusion model designed for high-quality, realistic image generation with enhanced features. It targets researchers and power users seeking advanced control and efficiency in AI-driven visual content creation, offering improvements over its predecessor, Kandinsky 3.0.

How It Works

Kandinsky 3.1 builds upon a latent diffusion architecture, incorporating a Flan-UL2 text encoder and a large U-Net. A key innovation is Kandinsky Flash, a distilled model using Adversarial Diffusion Distillation on latents for significantly faster inference (4 steps) without quality degradation. It also features prompt beautification via an LLM (Intel's neural-chat-7b-v3-1) and integrates IP-Adapter and ControlNet for image-conditional generation.

Quick Start & Requirements

  • Install via pip install -r requirements.txt after creating a conda environment.
  • Requires CUDA 11.1+ and PyTorch 1.10.1.
  • Example usage provided in ./examples Jupyter notebooks.
  • Official HuggingFace repository and project page links are available.

Highlighted Details

  • Kandinsky Flash offers 4-step inference, 3x faster than the base model.
  • Integrates prompt beautification using an LLM for improved text-to-image results.
  • Supports inpainting, image fusion, and image variations.
  • IP-Adapter and ControlNet enable image-conditional generation.

Maintenance & Community

The project is actively developed by a team including Vladimir Arkhipkin, Anastasia Maltseva, Andrei Filatov, and Igor Pavlov. Links to HuggingFace and a Telegram bot are provided for community engagement.

Licensing & Compatibility

The model weights are released under a permissive license, allowing for commercial use and integration into closed-source applications.

Limitations & Caveats

The initial installation instructions specify PyTorch 1.10.1+cu111, which is an older version and may require careful dependency management for compatibility with newer CUDA toolkits.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.4%
4k
Image synthesis research paper using a linear diffusion transformer
Created 11 months ago
Updated 5 days ago
Starred by Robin Rombach Robin Rombach(Cofounder of Black Forest Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
Created 2 years ago
Updated 1 year ago
Starred by Robin Huang Robin Huang(Cofounder of Comfy Org), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
17 more.

stablediffusion by Stability-AI

0.1%
42k
Latent diffusion model for high-resolution image synthesis
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.