ComfyUI-SeedVR2_VideoUpscaler by numz

Diffusion upscaling for video and images

Created 8 months ago

2,104 stars

Top 20.8% on SourcePulse

Project Summary

This project provides an official SeedVR2 video and image upscaler implementation for ComfyUI, enabling high-quality diffusion-based enhancement. It targets users seeking advanced upscaling capabilities, offering significant flexibility and performance optimizations to run on a wide range of hardware, including systems with limited VRAM. The primary benefit is achieving superior video and image quality through sophisticated AI models, with extensive control over the upscaling process.

How It Works

The core approach leverages Diffusion Transformer (DiT) models for upscaling and Variational Autoencoders (VAEs) for frame encoding/decoding. It employs aggressive memory optimization techniques such as BlockSwap (swapping model blocks between GPU and CPU) and VAE tiling (processing large resolutions in segments) to drastically reduce VRAM requirements. Further performance gains are achieved through torch.compile integration for DiT and VAE, support for FP8 and GGUF quantized models, and an optional modular architecture with specialized nodes for granular control.

Quick Start & Requirements

Installation:
- ComfyUI Manager (Recommended): Search for "ComfyUI-SeedVR2_VideoUpscaler" within the manager and install.
- Manual: Clone the repository into the ComfyUI custom_nodes directory and install dependencies via requirements.txt using a Python environment.
Prerequisites:
- Software: ComfyUI (latest recommended), Python 3.12+, PyTorch 2.0+ (for torch.compile), Triton (for torch.compile inductor backend), Flash Attention 2 (optional).
- Hardware: VRAM requirements vary significantly; 8GB is minimal (with heavy optimization), while 24GB+ is recommended for best quality and speed.
Resource Footprint: Models are downloaded automatically on first use.
Links: Official documentation is within the README. Video tutorials are linked for setup and advanced usage.

Highlighted Details

Advanced Memory Management: BlockSwap, VAE tiling, and model/tensor offloading allow running large models on low-VRAM GPUs.
Performance Optimizations: torch.compile integration offers substantial speedups (20-40% DiT, 15-25% VAE), alongside support for FP8 and GGUF quantized models.
Multi-GPU CLI: A standalone command-line interface enables batch processing and distributed rendering across multiple GPUs.
Modular Architecture: A four-node system (DiT Load, VAE Load, Torch Compile, Upscaler) provides fine-grained control over the upscaling pipeline.
Quality Features: Includes LAB color correction, temporal consistency controls, and native RGBA support.

Maintenance & Community

This project is a collaborative effort between NumZ and AInVFX, building upon the original SeedVR2 by ByteDance. It benefits from numerous community contributors. Active development is tracked via GitHub Issues, and community support is available through GitHub Discussions.

Licensing & Compatibility

The project is released under the permissive MIT License, making it suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The batch_size parameter must adhere to a 4n+1 formula due to the temporal consistency architecture. The VAE encoding/decoding stages can represent a performance bottleneck, particularly at high resolutions. While optimized for low VRAM, achieving optimal results and speed is highly dependent on available hardware, model precision, and the effective use of memory-saving features. Version 2.5.0 introduced breaking changes requiring workflow updates.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

190 stars in the last 30 days