FRESCO by williamyang1991

Video translation research paper (CVPR 2024)

Created 1 year ago

784 stars

Top 44.8% on SourcePulse

Project Summary

FRESCO addresses zero-shot video translation by introducing a novel spatial-temporal correspondence mechanism to enhance temporal consistency in diffusion model-based video generation. It targets researchers and practitioners in computer vision and generative AI who need to translate videos without retraining models, offering improved visual coherence and robustness to motion compared to prior methods.

How It Works

FRESCO enhances zero-shot video translation by combining intra-frame and inter-frame correspondence, creating a more robust spatial-temporal constraint than attention guidance or optical flow alone. This approach explicitly updates features to maintain high consistency across frames, leading to improved visual coherence in translated videos. It is designed to be more robust to large and quick motion than previous works like Rerender-A-Video.

Quick Start & Requirements

Installation: Clone the repository and install dependencies via pip install -r requirements.txt after setting up a conda environment with PyTorch >= 2.0.0 (e.g., torch==2.0.0+cu118). Run python install.py to download required models.
Prerequisites: Python 3.8.5, PyTorch >= 2.0.0, CUDA >= 11.8 (for GPU acceleration), and Hugging Face access.
Running: Use python run_fresco.py <config_file> for command-line inference or python webUI.py for a Gradio-based web interface.
Links: Project Page, Paper, Diffusers Pipeline, Web Demo.

Highlighted Details

Achieves improved temporal consistency and robustness to motion compared to optical flow methods.
Zero-shot capability: no model training or fine-tuning required.
Flexible: compatible with off-the-shelf models like ControlNet and LoRA.
Offers extensive advanced options for frame processing, FRESCO constraints, and full video translation, including background smoothing and gradient blending.

Maintenance & Community

The project is actively maintained, with recent updates integrating into Hugging Face Diffusers and providing a web demo. It is primarily developed based on Rerender-A-Video, ControlNet, Stable Diffusion, GMFlow, and Ebsynth.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, given its dependencies on other projects, users should verify compatibility for commercial or closed-source use.

Limitations & Caveats

The README mentions potential Out-of-Memory (OOM) issues for large videos, recommending smaller batch sizes. Compatibility with newer versions of the diffusers library may require modifications to the my_forward() function. The video_blend.py script for temporal blending is based on a previous work and may have specific usage differences.

FRESCO by williamyang1991

Explore Similar Projects

MiraData by mira-space

RAVE by RehgLab

vid2vid-zero by baaivision

Gen-L-Video by G-U-N

cycle-diffusion by ChenWu98

ChronoEdit by nv-tlabs

Pusa-VidGen by Yaofang-Liu

SEINE by Vchitect

TokenFlow by omerbt

Rerender_A_Video by williamyang1991

sdnext by vladmandic

SkyReels-V2 by SkyworkAI