Video-P2P  by dvlab-research

Video editing via cross-attention control (CVPR 2024 research paper)

created 2 years ago
414 stars

Top 71.8% on sourcepulse

GitHubView on GitHub
Project Summary

Video-P2P provides a method for editing videos by controlling cross-attention mechanisms, enabling users to modify video content based on textual prompts. This project is targeted at researchers and practitioners in AI-driven video generation and editing. It offers a novel approach to fine-grained video manipulation, allowing for more precise and creative control over visual elements.

How It Works

Video-P2P leverages cross-attention control within a diffusion model framework, similar to techniques used in image generation. The core idea is to guide the video generation process by manipulating attention maps, allowing specific parts of the video to be influenced by particular text prompts. This approach offers greater control over the editing process compared to global prompt-based methods.

Quick Start & Requirements

  • Install: Create a conda environment (conda create --name vp2p python=3.9, conda activate vp2p) and install dependencies (pip install -r requirements.txt).
  • Prerequisites: Requires Python 3.9, a Stable Diffusion model (e.g., sd1.5), and at least 20GB of VRAM (tested on Tesla V100 32GB and RTX3090 24GB). xformers may require specific handling on certain GPUs.
  • Resources: Setup involves downloading a Stable Diffusion model. The tuning stage takes longer than the attention control stage, with a faster mode available for attention control (1 min on V100).
  • Links: diffusers, Tune-A-Video, Gradio Demo.

Highlighted Details

  • CVPR 2024 paper.
  • Offers both a faster (1 min) and a more stable (10 min) mode for attention control on V100.
  • Includes a local Gradio demo for interactive use.
  • Released dataset available for download.

Maintenance & Community

The project is associated with dvlab-research and its authors. The README indicates a "Todo" list with completed items like code release, demo, and dataset release, suggesting active development.

Licensing & Compatibility

The README does not explicitly state a license. However, it references other projects like diffusers and Tune-A-Video, which are typically under permissive licenses (e.g., MIT, Apache 2.0). Compatibility for commercial use would require explicit license confirmation.

Limitations & Caveats

The project requires significant VRAM (20GB+), potentially limiting its use on consumer hardware. The xformers dependency might introduce compatibility issues on certain GPU configurations. The project is presented as an official implementation of a CVPR 2024 paper, implying it is research-oriented and may not have the robustness of production-ready software.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.