SeedVR by ByteDance-Seed

Diffusion models for advanced video restoration

Created 5 months ago

766 stars

Top 45.5% on SourcePulse

Project Summary

SeedVR and SeedVR2 address limitations in video restoration (VR) by leveraging advanced diffusion transformer architectures and novel training techniques. SeedVR enables arbitrary-resolution restoration without relying on pre-trained diffusion priors, while SeedVR2 achieves one-step video restoration through diffusion adversarial post-training. These projects target researchers and practitioners seeking to improve video quality and inference efficiency, particularly for real-world and AI-generated content, offering a significant advancement over conventional and patch-based diffusion methods.

How It Works

SeedVR employs a diffusion transformer model designed for generic video restoration, capable of handling arbitrary resolutions by integrating advanced video generation technologies. This approach avoids the constraints and biases of pre-trained diffusion priors and the slow inference speeds associated with patch-based sampling in existing diffusion models. SeedVR2 builds upon this by introducing a one-step restoration process using diffusion adversarial post-training. Key enhancements include an adaptive window attention mechanism that dynamically adjusts to output resolutions, ensuring consistency, and a feature matching loss to stabilize adversarial training, thereby improving efficiency and temporal consistency for high-resolution video.

Quick Start & Requirements

Setup involves cloning the repository (https://github.com/bytedance-seed/SeedVR.git), creating a conda environment with Python 3.10, and installing dependencies via pip install -r requirements.txt, flash_attn==2.5.9.post1, and apex (pre-built wheels provided for specific CUDA/PyTorch versions). Pretrained checkpoints are available on Hugging Face (e.g., ByteDance-Seed/SeedVR2-3B). Inference requires significant GPU resources; for example, one H100-80G can process videos up to 100x720x1280, with 4 H100-80G supporting 1080p and 2K resolutions using sequence parallelism.

Highlighted Details

Presents SeedVR as the "largest-ever diffusion transformer model towards generic video restoration."
SeedVR2 offers "one-step video restoration" with performance comparable or superior to existing methods.
Both projects have been recognized, with SeedVR highlighted at CVPR 2025.
The codebase includes scripts for environment setup, checkpoint downloading, and inference.

Maintenance & Community

The repository was created on June 11, 2025, and acknowledges support from the open community. No specific community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

SeedVR and SeedVR2 are released under the Apache 2.0 license, permitting commercial use and modification.

Limitations & Caveats

The provided models are prototypes, and their performance may not perfectly match the published papers. The methods exhibit limited robustness against heavy degradations and large motions, sharing failure cases with existing approaches. Furthermore, their strong generation capabilities can lead to over-generation of details and occasional over-sharpening on lightly degraded videos, particularly at lower resolutions (e.g., 480p).

Health Check

Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

76 stars in the last 30 days