TokenFlow  by omerbt

Framework for consistent video editing using diffusion features (ICLR 2024)

created 2 years ago
1,669 stars

Top 25.9% on sourcepulse

GitHubView on GitHub
Project Summary

TokenFlow provides a framework for consistent video editing using pre-trained text-to-image diffusion models, targeting researchers and practitioners in generative AI for video. It enables high-quality, text-driven video edits while preserving the original video's spatial layout and dynamics, without requiring model retraining.

How It Works

TokenFlow achieves video editing consistency by enforcing it within the diffusion feature space. It explicitly propagates diffusion features across frames using inter-frame correspondences inherent in the diffusion model. This approach allows it to leverage any off-the-shelf text-to-image editing technique, such as Plug-and-Play, ControlNet, or SDEdit, for structure-preserving edits.

Quick Start & Requirements

  • Install: Create a conda environment (conda create -n tokenflow python=3.9) and activate it (conda activate tokenflow), then install dependencies (pip install -r requirements.txt).
  • Prerequisites: Python 3.9, PyTorch.
  • Preprocessing: Run python preprocess.py --data_path <video_path> --inversion_prompt <prompt>.
  • Editing: Configure via YAML files (e.g., configs/config_pnp.yaml) and run corresponding scripts (run_tokenflow_pnp.py, run_tokenflow_controlnet.py, etc.).
  • Docs: Project Page: https://github.com/omerbt/TokenFlow

Highlighted Details

  • Official PyTorch implementation for "TokenFlow: Consistent Diffusion Features for Consistent Video Editing" (ICLR 2024).
  • Enables text-driven video editing without further training or finetuning.
  • Demonstrates state-of-the-art editing results on real-world videos.
  • Compatible with various image editing techniques (Plug-and-Play, ControlNet, SDEdit).

Maintenance & Community

The project is associated with authors from Google Research. Further community or roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The effectiveness of editing relies on a good video reconstruction during preprocessing. The LDM decoder may introduce minor jitter depending on the original video content.

Health Check
Last commit

6 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
21 stars in the last 90 days

Explore Similar Projects

Starred by Chenlin Meng Chenlin Meng(Cofounder of Pika), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
1 more.

Tune-A-Video by showlab

0%
4k
Text-to-video generation via diffusion model fine-tuning
created 2 years ago
updated 1 year ago
Feedback? Help us improve.