DLoRAL  by yjsunnn

Diffusion model for video super-resolution

Created 4 months ago
296 stars

Top 89.4% on SourcePulse

GitHubView on GitHub
Project Summary

DLoRAL offers a one-step diffusion model for video super-resolution, focusing on detail enhancement and temporal consistency. It is designed for researchers and practitioners in computer vision and video processing who need to upscale low-quality videos.

How It Works

The framework employs a dynamic dual-stage training scheme. The consistency stage optimizes temporal coherence, while the enhancement stage refines spatial details. This is achieved through smooth loss interpolation for stable training. During inference, both C-LoRA and D-LoRA modules are merged into a frozen diffusion UNet, enabling a single-step enhancement process.

Quick Start & Requirements

  • Installation: Clone the repository, create a Conda environment (conda create -n DLoRAL python=3.10), activate it (conda activate DLoRAL), and install dependencies (pip install -r requirements.txt, pip install openmim, mim install mmcv-full, pip install mmedit).
  • Prerequisites: Python 3.10, CUDA (implied by mmcv-full), PyTorch. Specific model weights (RAM, DAPE, and pretrained checkpoints) need to be downloaded and placed in the preset/models/ directory.
  • Inference: python src/test_DLoRAL.py --pretrained_model_path stabilityai/stable-diffusion-2-1-base --ram_ft_path /path/to/DLoRAL/preset/models/DAPE.pth --ram_path '/path/to/DLoRAL/preset/models/ram_swin_large_14m.pth' --pretrained_path /path/to/DLoRAL/preset/models/checkpoints/model.pkl -i /path/to/input_videos/ -o /path/to/results
  • Colab Demo: Available at https://colab.research.google.com/github/yjsunnn/DLoRAL/blob/main/notebooks/inference_dloral.ipynb

Highlighted Details

  • Official implementation of "One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution."
  • Achieves detail-rich and temporally consistent video upscaling.
  • Merges C-LoRA and D-LoRA into a frozen diffusion UNet for one-step inference.
  • Training code is available, with a TODO for releasing training data.

Maintenance & Community

The project is associated with The Hong Kong Polytechnic University and OPPO Research Institute. Contact: yujingsun1999@gmail.com.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Training data is not yet released. The project is recent, and extensive community adoption or long-term maintenance is yet to be established.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
6
Star History
20 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), and
1 more.

FastVideo by hao-ai-lab

1.2%
3k
Framework for accelerated video generation
Created 1 year ago
Updated 2 days ago
Feedback? Help us improve.