TokenFlow  by omerbt

Framework for consistent video editing using diffusion features (ICLR 2024)

Created 2 years ago
1,678 stars

Top 25.2% on SourcePulse

GitHubView on GitHub
Project Summary

TokenFlow provides a framework for consistent video editing using pre-trained text-to-image diffusion models, targeting researchers and practitioners in generative AI for video. It enables high-quality, text-driven video edits while preserving the original video's spatial layout and dynamics, without requiring model retraining.

How It Works

TokenFlow achieves video editing consistency by enforcing it within the diffusion feature space. It explicitly propagates diffusion features across frames using inter-frame correspondences inherent in the diffusion model. This approach allows it to leverage any off-the-shelf text-to-image editing technique, such as Plug-and-Play, ControlNet, or SDEdit, for structure-preserving edits.

Quick Start & Requirements

  • Install: Create a conda environment (conda create -n tokenflow python=3.9) and activate it (conda activate tokenflow), then install dependencies (pip install -r requirements.txt).
  • Prerequisites: Python 3.9, PyTorch.
  • Preprocessing: Run python preprocess.py --data_path <video_path> --inversion_prompt <prompt>.
  • Editing: Configure via YAML files (e.g., configs/config_pnp.yaml) and run corresponding scripts (run_tokenflow_pnp.py, run_tokenflow_controlnet.py, etc.).
  • Docs: Project Page: https://github.com/omerbt/TokenFlow

Highlighted Details

  • Official PyTorch implementation for "TokenFlow: Consistent Diffusion Features for Consistent Video Editing" (ICLR 2024).
  • Enables text-driven video editing without further training or finetuning.
  • Demonstrates state-of-the-art editing results on real-world videos.
  • Compatible with various image editing techniques (Plug-and-Play, ControlNet, SDEdit).

Maintenance & Community

The project is associated with authors from Google Research. Further community or roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The effectiveness of editing relies on a good video reconstruction during preprocessing. The LDM decoder may introduce minor jitter depending on the original video content.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), Jiaming Song Jiaming Song(Chief Scientist at Luma AI), and
1 more.

SkyReels-V2 by SkyworkAI

3.3%
4k
Film generation model for infinite-length videos using diffusion forcing
Created 5 months ago
Updated 1 month ago
Starred by Deepak Pathak Deepak Pathak(Cofounder of Skild AI; Professor at CMU), Travis Fischer Travis Fischer(Founder of Agentic), and
8 more.

sygil-webui by Sygil-Dev

0.0%
8k
Web UI for Stable Diffusion
Created 3 years ago
Updated 2 months ago
Feedback? Help us improve.