Discover and explore top open-source AI tools and projects—updated daily.
numzDiffusion upscaling for video and images
Top 23.7% on SourcePulse
This project provides an official SeedVR2 video and image upscaler implementation for ComfyUI, enabling high-quality diffusion-based enhancement. It targets users seeking advanced upscaling capabilities, offering significant flexibility and performance optimizations to run on a wide range of hardware, including systems with limited VRAM. The primary benefit is achieving superior video and image quality through sophisticated AI models, with extensive control over the upscaling process.
How It Works
The core approach leverages Diffusion Transformer (DiT) models for upscaling and Variational Autoencoders (VAEs) for frame encoding/decoding. It employs aggressive memory optimization techniques such as BlockSwap (swapping model blocks between GPU and CPU) and VAE tiling (processing large resolutions in segments) to drastically reduce VRAM requirements. Further performance gains are achieved through torch.compile integration for DiT and VAE, support for FP8 and GGUF quantized models, and an optional modular architecture with specialized nodes for granular control.
Quick Start & Requirements
custom_nodes directory and install dependencies via requirements.txt using a Python environment.torch.compile), Triton (for torch.compile inductor backend), Flash Attention 2 (optional).Highlighted Details
torch.compile integration offers substantial speedups (20-40% DiT, 15-25% VAE), alongside support for FP8 and GGUF quantized models.Maintenance & Community
This project is a collaborative effort between NumZ and AInVFX, building upon the original SeedVR2 by ByteDance. It benefits from numerous community contributors. Active development is tracked via GitHub Issues, and community support is available through GitHub Discussions.
Licensing & Compatibility
The project is released under the permissive MIT License, making it suitable for commercial use and integration into closed-source projects.
Limitations & Caveats
The batch_size parameter must adhere to a 4n+1 formula due to the temporal consistency architecture. The VAE encoding/decoding stages can represent a performance bottleneck, particularly at high resolutions. While optimized for low VRAM, achieving optimal results and speed is highly dependent on available hardware, model precision, and the effective use of memory-saving features. Version 2.5.0 introduced breaking changes requiring workflow updates.
2 weeks ago
Inactive
hao-ai-lab
Lightricks
Tencent-Hunyuan