Discover and explore top open-source AI tools and projects—updated daily.
showlabVersatile video editing via natural language instructions and references
Top 95.6% on SourcePulse
Summary
Kiwi-Edit is a unified, open-source framework for advanced video editing guided by natural language instructions and reference images. It targets researchers and power users seeking flexible video manipulation capabilities, enabling tasks like style transfer, object manipulation, and background replacement through intuitive text prompts.
How It Works
The framework leverages a Multi-modal Large Language Model (MLLM) encoder combined with a video Diffusion Transformer (DiT) architecture. This approach allows for sophisticated understanding of textual instructions and visual references, facilitating precise video modifications. Its advantage lies in seamlessly integrating both instruction-only and reference-guided editing paradigms within a single, versatile system.
Quick Start & Requirements
pip install -e ., DeepSpeed, FlashAttention, transformers, huggingface-hub, wandb). An alternative install_full_env.sh script is provided.conda create -n diffusers python=3.10 -y and installing diffusers, decord, einops, accelerate, transformers==4.57.0, opencv-python, av.Wan-AI/Wan2.2-TI2V-5B) must be downloaded via Hugging Face Hub.bash demo.py ...) and the Diffusers setup (python diffusers_demo.py ...).Highlighted Details
Maintenance & Community
The project authors include Yiqi Lin, Guoqiang Liang, Ziyun Zeng, Zechen Bai, Yanzhe Chen, and Mike Zheng Shou. No specific community channels (e.g., Discord, Slack) or roadmap details are provided in the README.
Licensing & Compatibility
The license type is not explicitly stated in the provided README. This omission requires further investigation for commercial use or closed-source integration compatibility. The project relies on standard deep learning libraries like PyTorch, Accelerate, and Hugging Face Transformers/Diffusers.
Limitations & Caveats
Strict environment requirements include Python 3.10, CUDA 12.8, and PyTorch 2.7. Installation involves multiple manual steps and downloading large model weights. The absence of a specified license poses a significant adoption blocker for many use cases. Gemini-based evaluation scripts necessitate careful API key management.
1 month ago
Inactive