FlashVSR by OpenImagingLab

Diffusion-based framework for real-time streaming video super-resolution

Created 5 months ago

1,397 stars

Top 28.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jiaming Song

Chief Scientist at Luma AI

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> FlashVSR addresses latency and computational challenges in diffusion-based video super-resolution (VSR) for real-time streaming. This efficient, one-step diffusion framework targets researchers and practitioners, offering significant speedups and scalability to ultra-high resolutions without quality loss.

How It Works

The framework utilizes a train-friendly three-stage distillation pipeline for streaming VSR. Key innovations include Locality-Constrained Sparse Attention (LCSA) for reduced computation and bridging train-test resolution gaps, plus a Tiny Conditional Decoder for accelerated, high-quality reconstruction. This approach enables practical, real-time performance and scalability.

Quick Start & Requirements

Installation requires cloning the repo (https://github.com/OpenImagingLab/FlashVSR), setting up Python 3.11.13, and running pip install -e . and pip install -r requirements.txt. A critical prerequisite is Block-Sparse Attention, which needs memory-intensive compilation and is optimized for NVIDIA A100/A800/H200; compatibility on other NVIDIA GPUs is unknown. Model weights require Git LFS. See https://github.com/mit-han-lab/Block-Sparse-Attention for its docs.

Highlighted Details

Achieves ~17 FPS for 768x1408 videos on a single A100 GPU.
Offers up to ~12x speedup over prior one-step diffusion VSR models.
Introduces VSR-120K dataset (120k videos, 180k images) for large-scale training.
Official implementation with LCSA module preserves finer details and avoids artifacts better than third-party versions lacking it.
Primarily designed and optimized for 4x video super-resolution.

Maintenance & Community

Active community testing and feedback are noted, with discussions on third-party implementations available via GitHub issues (e.g., https://github.com/kijai/ComfyUI-WanVideoWrapper/issues/1441). The VSR-120K dataset release is planned. Main repo: https://github.com/OpenImagingLab/FlashVSR.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. No specific compatibility notes for commercial use or closed-source linking are provided.

Limitations & Caveats

Performance and compatibility on GPUs outside NVIDIA A100/A800/H200 are unknown. The Block-Sparse Attention dependency has a demanding build process and potential compatibility issues. Third-party implementations omitting LCSA may degrade quality. The framework is primarily optimized for 4x SR.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

94 stars in the last 30 days