Rerender_A_Video  by williamyang1991

Video-to-video translation framework for zero-shot text-guided video rendering

created 2 years ago
2,983 stars

Top 16.4% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a zero-shot, text-guided video-to-video translation framework for researchers and artists. It addresses the challenge of maintaining temporal consistency in video generation by leveraging adapted diffusion models, enabling users to restyle videos based on text prompts without retraining.

How It Works

The framework consists of two main stages: key frame translation and full video translation. Key frames are generated using a diffusion model enhanced with hierarchical cross-frame constraints to ensure coherence in shape, texture, and color. Subsequent frames are then propagated from these key frames using temporal-aware patch matching and frame blending techniques. This approach allows for global style and local texture consistency with minimal computational cost.

Quick Start & Requirements

  • Install: Clone the repository with --recursive and run pip install -r requirements.txt or use the provided environment.yml.
  • Prerequisites: PyTorch with CUDA support, Python 3.x. Requires 24GB VRAM.
  • Run Demo: python rerender.py --cfg config/real2sculpture.json
  • More Info: Project Page

Highlighted Details

  • Zero-shot translation: No training or fine-tuning required.
  • Compatibility with ControlNet and LoRA for customized translations.
  • Achieves temporal consistency through cross-frame constraints and shape/pixel-aware fusion.
  • Offers a WebUI for interactive experimentation and command-line scripts for batch processing.

Maintenance & Community

The project was accepted to SIGGRAPH Asia 2023 and has been integrated into Hugging Face Diffusers. Updates include Loose cross-frame attention and FreeU integration.

Licensing & Compatibility

The repository is released under the MIT License, permitting commercial use and linking with closed-source projects.

Limitations & Caveats

The primary requirement is 24GB of VRAM, though memory reduction techniques are suggested. Installation on Windows may require manual setup of CUDA, Git, and Visual Studio with the Windows SDK, and pre-compiled binaries for ebsynth are provided as a fallback. Path names should only contain English letters or underscores to avoid FileNotFoundError.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
13 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.