SparkVSR by taco-group

Controllable video super-resolution via sparse keyframe propagation

Created 3 months ago

686 stars

Top 48.7% on SourcePulse

Project Summary

SparkVSR introduces an interactive Video Super-Resolution (VSR) framework, addressing the limitations of black-box VSR models by enabling users to control output quality via sparse keyframes. This project targets researchers and practitioners seeking controllable VSR solutions, offering improved temporal consistency and restoration quality, with potential applications beyond VSR.

How It Works

SparkVSR employs a two-stage training pipeline: first, a keyframe-conditioned latent-pixel approach fuses low-resolution video latents with sparsely encoded high-resolution keyframe latents for robust cross-space propagation. The second stage refines perceptual details in pixel space. At inference, it supports flexible keyframe selection (manual, codec I-frame, random sampling) and a reference-free guidance mechanism that balances adherence to keyframes with blind restoration, ensuring robust performance even with imperfect references.

Quick Start & Requirements

Installation: Clone the repository, create a conda environment (conda create -n sparkvsr python=3.10), activate it, and install dependencies (pip install -r requirements.txt). A specific PyTorch installation for CUDA 12.4 is provided: pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu124.
Prerequisites: Python 3.10+, PyTorch >= 2.5.0, Diffusers. Datasets (HQ-VSR, DIV2K-HR for training; UDM10, SPMCS, YouHQ40, RealVSR, MovieLQ for testing) must be downloaded and prepared using prepare_dataset.py. Inference modes may require external API keys or separate model installations (e.g., PiSA-SR).
Resource Footprint: Training requires 4x A100 GPUs. Inference setup for API mode requires an API key; PiSA-SR mode requires cloning its repository and downloading weights.
Links: GitHub Repo, PyTorch Previous Versions.

Highlighted Details

Achieves state-of-the-art performance, surpassing baselines by up to 24.6% on CLIP-IQA, 21.8% on DOVER, and 5.6% on MUSIQ.
Demonstrates generality by applying out-of-the-box to unseen tasks like old-film restoration and video style transfer.
Offers three inference modes: API (using fal-ai/nano-banana-pro), PiSA-SR (using open-source PiSA-SR), and a no-reference fallback mode.
Keyframe selection is flexible, with a recommendation to use the first frame for short clips and a strict interval requirement (>4 frames) between reference indices.

Maintenance & Community

The project was released on March 17, 2026. Key contributors are listed as authors from Texas A&M University and YouTube/Google. Several items are marked as TODO, including releasing inference code, pre-trained models, project page, and ComfyUI integration. No community channels (Discord, Slack) or social handles are provided.

Licensing & Compatibility

The license type is not explicitly stated in the README. Compatibility for commercial use or linking with closed-source projects is therefore undetermined.

Limitations & Caveats

As a newly released project (March 2026), SparkVSR has several outstanding TODO items, indicating it is likely in an early development or alpha stage. Training demands significant hardware resources (4x A100 GPUs), and certain inference modes require external dependencies and API keys, adding complexity to setup. The absence of a stated license is a critical adoption blocker for many use cases.

Health Check

Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

17 stars in the last 30 days