DLoRAL by yjsunnn

Diffusion model for video super-resolution

Created 6 months ago

328 stars

Top 83.5% on SourcePulse

Project Summary

DLoRAL offers a one-step diffusion model for video super-resolution, focusing on detail enhancement and temporal consistency. It is designed for researchers and practitioners in computer vision and video processing who need to upscale low-quality videos.

How It Works

The framework employs a dynamic dual-stage training scheme. The consistency stage optimizes temporal coherence, while the enhancement stage refines spatial details. This is achieved through smooth loss interpolation for stable training. During inference, both C-LoRA and D-LoRA modules are merged into a frozen diffusion UNet, enabling a single-step enhancement process.

Quick Start & Requirements

Installation: Clone the repository, create a Conda environment (conda create -n DLoRAL python=3.10), activate it (conda activate DLoRAL), and install dependencies (pip install -r requirements.txt, pip install openmim, mim install mmcv-full, pip install mmedit).
Prerequisites: Python 3.10, CUDA (implied by mmcv-full), PyTorch. Specific model weights (RAM, DAPE, and pretrained checkpoints) need to be downloaded and placed in the preset/models/ directory.
Inference: python src/test_DLoRAL.py --pretrained_model_path stabilityai/stable-diffusion-2-1-base --ram_ft_path /path/to/DLoRAL/preset/models/DAPE.pth --ram_path '/path/to/DLoRAL/preset/models/ram_swin_large_14m.pth' --pretrained_path /path/to/DLoRAL/preset/models/checkpoints/model.pkl -i /path/to/input_videos/ -o /path/to/results
Colab Demo: Available at https://colab.research.google.com/github/yjsunnn/DLoRAL/blob/main/notebooks/inference_dloral.ipynb

Highlighted Details

Official implementation of "One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution."
Achieves detail-rich and temporally consistent video upscaling.
Merges C-LoRA and D-LoRA into a frozen diffusion UNet for one-step inference.
Training code is available, with a TODO for releasing training data.

Maintenance & Community

The project is associated with The Hong Kong Polytechnic University and OPPO Research Institute. Contact: yujingsun1999@gmail.com.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Training data is not yet released. The project is recent, and extensive community adoption or long-term maintenance is yet to be established.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

2

Star History

10 stars in the last 30 days

Explore Similar Projects

gcd by basilevh

Generative model for extreme monocular dynamic novel view synthesis

Created 1 year ago

Updated 1 month ago

kandinsky-5 by kandinskylab

Advanced diffusion models for versatile video and image generation

Created 5 months ago

Updated 1 week ago

Eagle by NVlabs

Vision-language model for long-context multimodal learning

Created 1 year ago

Updated 2 months ago

TeaCache by ali-vilab

Training-free caching approach for video diffusion model inference

Created 1 year ago

Updated 7 months ago

Allegro by rhymes-ai

Text-to-video model for generating short, high-quality videos

Created 1 year ago

Updated 11 months ago

HYPIR by XPixelGroup

Image restoration using diffusion score priors

Created 5 months ago

Updated 2 months ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

EasyAnimate by aigc-apps

Video generator for high-resolution, long AI videos using transformer diffusion

Created 1 year ago

Updated 10 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

LTX-2 by Lightricks

DiT-based audio-video foundation model for generative tasks

Created 1 week ago

Updated 3 days ago

musubi-tuner by kohya-ss

LoRA training/inference scripts for video diffusion models

Created 1 year ago

Updated 16 hours ago

Starred by

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI),

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory), and

1 more.

FastVideo by hao-ai-lab

Framework for accelerated video generation

Created 1 year ago

Updated 1 day ago

Starred by

Saining Xie

Saining Xie(Professor at NYU) and

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI).

Pyramid-Flow by jy0205

Video generation method based on flow matching

Created 1 year ago

Updated 1 year ago

Starred by

Gabriel Almeida

Gabriel Almeida(Cofounder of Langflow),

Alex Yu

Alex Yu(Research Scientist at OpenAI; Cofounder of Luma AI), and

2 more.

LTX-Video by Lightricks

DiT-based video generation model for high-quality, real-time video creation

Created 1 year ago

Updated 6 days ago

Feedback? Help us improve.