Video-P2P by JIA-Lab-research

Video editing via cross-attention control (CVPR 2024 research paper)

Created 2 years ago

424 stars

Top 69.5% on SourcePulse

Project Summary

Video-P2P provides a method for editing videos by controlling cross-attention mechanisms, enabling users to modify video content based on textual prompts. This project is targeted at researchers and practitioners in AI-driven video generation and editing. It offers a novel approach to fine-grained video manipulation, allowing for more precise and creative control over visual elements.

How It Works

Video-P2P leverages cross-attention control within a diffusion model framework, similar to techniques used in image generation. The core idea is to guide the video generation process by manipulating attention maps, allowing specific parts of the video to be influenced by particular text prompts. This approach offers greater control over the editing process compared to global prompt-based methods.

Quick Start & Requirements

Install: Create a conda environment (conda create --name vp2p python=3.9, conda activate vp2p) and install dependencies (pip install -r requirements.txt).
Prerequisites: Requires Python 3.9, a Stable Diffusion model (e.g., sd1.5), and at least 20GB of VRAM (tested on Tesla V100 32GB and RTX3090 24GB). xformers may require specific handling on certain GPUs.
Resources: Setup involves downloading a Stable Diffusion model. The tuning stage takes longer than the attention control stage, with a faster mode available for attention control (1 min on V100).
Links: diffusers, Tune-A-Video, Gradio Demo.

Highlighted Details

CVPR 2024 paper.
Offers both a faster (1 min) and a more stable (10 min) mode for attention control on V100.
Includes a local Gradio demo for interactive use.
Released dataset available for download.

Maintenance & Community

The project is associated with dvlab-research and its authors. The README indicates a "Todo" list with completed items like code release, demo, and dataset release, suggesting active development.

Licensing & Compatibility

The README does not explicitly state a license. However, it references other projects like diffusers and Tune-A-Video, which are typically under permissive licenses (e.g., MIT, Apache 2.0). Compatibility for commercial use would require explicit license confirmation.

Limitations & Caveats

The project requires significant VRAM (20GB+), potentially limiting its use on consumer hardware. The xformers dependency might introduce compatibility issues on certain GPU configurations. The project is presented as an official implementation of a CVPR 2024 paper, implying it is research-oriented and may not have the robustness of production-ready software.

Video-P2P by JIA-Lab-research

Explore Similar Projects

DiTCtrl by TencentARC

Gausian_native_editor by gausian-AI

RAVE by RehgLab

InstructDiffusion by cientgu

FreeNoise by AILab-CVC

ChronoEdit by nv-tlabs

kandinsky-5 by kandinskylab

JJYB_AI_VideoAutoCut by jianjieyiban

Allegro by rhymes-ai

stable-diffusion-videos by nateraw

LTX-Video by Lightricks

Wan2.2 by Wan-Video