Discover and explore top open-source AI tools and projects—updated daily.
kandinskylabAdvanced diffusion models for versatile video and image generation
Top 62.6% on SourcePulse
Kandinsky 5.0 provides a family of advanced diffusion models for generating high-quality images and videos from text and image prompts. It targets engineers, researchers, and power users seeking robust AI media generation tools, offering flexible model sizes and capabilities for diverse applications.
How It Works
The system employs a latent diffusion pipeline, leveraging a Diffusion Transformer (DiT) as its core generative backbone. Generation is conditioned on text embeddings derived from Qwen2.5-VL and CLIP models, with video encoding and decoding handled by the HunyuanVideo 3D VAE. Its novelty lies in offering distinct "Pro" (19B) and "Lite" (2B, 6B) model variants, supporting various generation tasks (T2V, I2V, T2I, I2I), and incorporating advanced optimizations like Flow Matching and cross-attention for controllable, high-fidelity outputs.
Quick Start & Requirements
git clone https://github.com/kandinskylab/kandinsky-5.git), navigate into the directory (cd kandinsky-5), and install dependencies (pip install -r requirements.txt).python download_models.py --models <model_name>.Highlighted Details
Maintenance & Community
Extensive core and contributor lists are provided, indicating active development. Beta testing for Kandinsky Video Lite is available via a Telegram bot.
Licensing & Compatibility
The repository's license is not explicitly stated in the README, which may pose a barrier for commercial or specific integration use cases.
Limitations & Caveats
A known bug in the source build for 10-second generation using the NABLA algorithm can produce noisy output, with a workaround provided. Latency benchmarks are specific to high-end hardware (NVIDIA H100, CUDA 12.8.1, PyTorch 2.8).
2 days ago
Inactive
YangLing0818
vladmandic
Stability-AI
Lightricks
Sygil-Dev