FRESCO  by williamyang1991

Video translation research paper (CVPR 2024)

created 1 year ago
771 stars

Top 46.2% on sourcepulse

GitHubView on GitHub
Project Summary

FRESCO addresses zero-shot video translation by introducing a novel spatial-temporal correspondence mechanism to enhance temporal consistency in diffusion model-based video generation. It targets researchers and practitioners in computer vision and generative AI who need to translate videos without retraining models, offering improved visual coherence and robustness to motion compared to prior methods.

How It Works

FRESCO enhances zero-shot video translation by combining intra-frame and inter-frame correspondence, creating a more robust spatial-temporal constraint than attention guidance or optical flow alone. This approach explicitly updates features to maintain high consistency across frames, leading to improved visual coherence in translated videos. It is designed to be more robust to large and quick motion than previous works like Rerender-A-Video.

Quick Start & Requirements

  • Installation: Clone the repository and install dependencies via pip install -r requirements.txt after setting up a conda environment with PyTorch >= 2.0.0 (e.g., torch==2.0.0+cu118). Run python install.py to download required models.
  • Prerequisites: Python 3.8.5, PyTorch >= 2.0.0, CUDA >= 11.8 (for GPU acceleration), and Hugging Face access.
  • Running: Use python run_fresco.py <config_file> for command-line inference or python webUI.py for a Gradio-based web interface.
  • Links: Project Page, Paper, Diffusers Pipeline, Web Demo.

Highlighted Details

  • Achieves improved temporal consistency and robustness to motion compared to optical flow methods.
  • Zero-shot capability: no model training or fine-tuning required.
  • Flexible: compatible with off-the-shelf models like ControlNet and LoRA.
  • Offers extensive advanced options for frame processing, FRESCO constraints, and full video translation, including background smoothing and gradient blending.

Maintenance & Community

The project is actively maintained, with recent updates integrating into Hugging Face Diffusers and providing a web demo. It is primarily developed based on Rerender-A-Video, ControlNet, Stable Diffusion, GMFlow, and Ebsynth.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, given its dependencies on other projects, users should verify compatibility for commercial or closed-source use.

Limitations & Caveats

The README mentions potential Out-of-Memory (OOM) issues for large videos, recommending smaller batch sizes. Compatibility with newer versions of the diffusers library may require modifications to the my_forward() function. The video_blend.py script for temporal blending is based on a previous work and may have specific usage differences.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.