Zero-shot video generator using text-to-image diffusion models
Top 11.9% on sourcepulse
This repository provides the official implementation for Text2Video-Zero, a method that leverages text-to-image diffusion models for zero-shot video generation and editing. It is designed for researchers and developers working with generative AI, offering capabilities to create videos from text prompts, guided by poses or edges, and to edit existing videos based on instructions.
How It Works
Text2Video-Zero adapts pre-trained text-to-image diffusion models to video generation by introducing cross-frame attention mechanisms and motion field enrichment. This approach allows the model to maintain temporal consistency and adhere to textual prompts, while also supporting conditional generation based on pose or edge maps derived from input videos.
Quick Start & Requirements
pip install -r requirements.txt
after cloning the repository.Highlighted Details
Maintenance & Community
The project is actively maintained by Picsart AI Research. Community contributions are welcomed, with several external implementations and extensions linked in the README.
Licensing & Compatibility
The code is published under the CreativeML Open RAIL-M license, which is permissive for research and commercial use, with specific clauses regarding responsible AI use.
Limitations & Caveats
The project is an active research implementation. While optimizations exist for lower VRAM, performance may vary. Some features, like background smoothing, require additional components not included in this repository.
2 years ago
1 day