Framework for consistent video editing using diffusion features (ICLR 2024)
Top 25.9% on sourcepulse
TokenFlow provides a framework for consistent video editing using pre-trained text-to-image diffusion models, targeting researchers and practitioners in generative AI for video. It enables high-quality, text-driven video edits while preserving the original video's spatial layout and dynamics, without requiring model retraining.
How It Works
TokenFlow achieves video editing consistency by enforcing it within the diffusion feature space. It explicitly propagates diffusion features across frames using inter-frame correspondences inherent in the diffusion model. This approach allows it to leverage any off-the-shelf text-to-image editing technique, such as Plug-and-Play, ControlNet, or SDEdit, for structure-preserving edits.
Quick Start & Requirements
conda create -n tokenflow python=3.9
) and activate it (conda activate tokenflow
), then install dependencies (pip install -r requirements.txt
).python preprocess.py --data_path <video_path> --inversion_prompt <prompt>
.configs/config_pnp.yaml
) and run corresponding scripts (run_tokenflow_pnp.py
, run_tokenflow_controlnet.py
, etc.).Highlighted Details
Maintenance & Community
The project is associated with authors from Google Research. Further community or roadmap details are not explicitly provided in the README.
Licensing & Compatibility
The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking is not detailed.
Limitations & Caveats
The effectiveness of editing relies on a good video reconstruction during preprocessing. The LDM decoder may introduce minor jitter depending on the original video content.
6 months ago
1 week