FlashVSR  by OpenImagingLab

Diffusion-based framework for real-time streaming video super-resolution

Created 1 month ago
1,008 stars

Top 37.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> FlashVSR addresses latency and computational challenges in diffusion-based video super-resolution (VSR) for real-time streaming. This efficient, one-step diffusion framework targets researchers and practitioners, offering significant speedups and scalability to ultra-high resolutions without quality loss.

How It Works

The framework utilizes a train-friendly three-stage distillation pipeline for streaming VSR. Key innovations include Locality-Constrained Sparse Attention (LCSA) for reduced computation and bridging train-test resolution gaps, plus a Tiny Conditional Decoder for accelerated, high-quality reconstruction. This approach enables practical, real-time performance and scalability.

Quick Start & Requirements

Installation requires cloning the repo (https://github.com/OpenImagingLab/FlashVSR), setting up Python 3.11.13, and running pip install -e . and pip install -r requirements.txt. A critical prerequisite is Block-Sparse Attention, which needs memory-intensive compilation and is optimized for NVIDIA A100/A800/H200; compatibility on other NVIDIA GPUs is unknown. Model weights require Git LFS. See https://github.com/mit-han-lab/Block-Sparse-Attention for its docs.

Highlighted Details

  • Achieves ~17 FPS for 768x1408 videos on a single A100 GPU.
  • Offers up to ~12x speedup over prior one-step diffusion VSR models.
  • Introduces VSR-120K dataset (120k videos, 180k images) for large-scale training.
  • Official implementation with LCSA module preserves finer details and avoids artifacts better than third-party versions lacking it.
  • Primarily designed and optimized for 4x video super-resolution.

Maintenance & Community

Active community testing and feedback are noted, with discussions on third-party implementations available via GitHub issues (e.g., https://github.com/kijai/ComfyUI-WanVideoWrapper/issues/1441). The VSR-120K dataset release is planned. Main repo: https://github.com/OpenImagingLab/FlashVSR.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. No specific compatibility notes for commercial use or closed-source linking are provided.

Limitations & Caveats

Performance and compatibility on GPUs outside NVIDIA A100/A800/H200 are unknown. The Block-Sparse Attention dependency has a demanding build process and potential compatibility issues. Third-party implementations omitting LCSA may degrade quality. The framework is primarily optimized for 4x SR.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
17
Star History
404 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), and
3 more.

taesd by madebyollin

0.4%
821
Tiny AutoEncoder for Stable Diffusion latents
Created 2 years ago
Updated 3 days ago
Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI) and Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

LightX2V by ModelTC

7.5%
942
Video generation inference framework for efficient synthesis
Created 8 months ago
Updated 2 days ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), and
1 more.

FastVideo by hao-ai-lab

1.6%
3k
Framework for accelerated video generation
Created 1 year ago
Updated 2 days ago
Feedback? Help us improve.