TokenFlow by omerbt

Framework for consistent video editing using diffusion features (ICLR 2024)

Created 2 years ago

1,695 stars

Top 24.8% on SourcePulse

Project Summary

TokenFlow provides a framework for consistent video editing using pre-trained text-to-image diffusion models, targeting researchers and practitioners in generative AI for video. It enables high-quality, text-driven video edits while preserving the original video's spatial layout and dynamics, without requiring model retraining.

How It Works

TokenFlow achieves video editing consistency by enforcing it within the diffusion feature space. It explicitly propagates diffusion features across frames using inter-frame correspondences inherent in the diffusion model. This approach allows it to leverage any off-the-shelf text-to-image editing technique, such as Plug-and-Play, ControlNet, or SDEdit, for structure-preserving edits.

Quick Start & Requirements

Install: Create a conda environment (conda create -n tokenflow python=3.9) and activate it (conda activate tokenflow), then install dependencies (pip install -r requirements.txt).
Prerequisites: Python 3.9, PyTorch.
Preprocessing: Run python preprocess.py --data_path <video_path> --inversion_prompt <prompt>.
Editing: Configure via YAML files (e.g., configs/config_pnp.yaml) and run corresponding scripts (run_tokenflow_pnp.py, run_tokenflow_controlnet.py, etc.).
Docs: Project Page: https://github.com/omerbt/TokenFlow

Highlighted Details

Official PyTorch implementation for "TokenFlow: Consistent Diffusion Features for Consistent Video Editing" (ICLR 2024).
Enables text-driven video editing without further training or finetuning.
Demonstrates state-of-the-art editing results on real-world videos.
Compatible with various image editing techniques (Plug-and-Play, ControlNet, SDEdit).

Maintenance & Community

The project is associated with authors from Google Research. Further community or roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The effectiveness of editing relies on a good video reconstruction during preprocessing. The LDM decoder may introduce minor jitter depending on the original video content.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days