Text2Video-Zero by Picsart-AI-Research

Zero-shot video generator using text-to-image diffusion models

Created 2 years ago

4,234 stars

Top 11.5% on SourcePulse

View on GitHub

4 Experts Love This Project

DevRel at Google DeepMind

Ajay Jain

Cofounder of Genmo

Project Summary

This repository provides the official implementation for Text2Video-Zero, a method that leverages text-to-image diffusion models for zero-shot video generation and editing. It is designed for researchers and developers working with generative AI, offering capabilities to create videos from text prompts, guided by poses or edges, and to edit existing videos based on instructions.

How It Works

Text2Video-Zero adapts pre-trained text-to-image diffusion models to video generation by introducing cross-frame attention mechanisms and motion field enrichment. This approach allows the model to maintain temporal consistency and adhere to textual prompts, while also supporting conditional generation based on pose or edge maps derived from input videos.

Quick Start & Requirements

Install via pip install -r requirements.txt after cloning the repository.
Requires Python 3.9 and CUDA >= 11.6.
Minimum GPU VRAM of 12 GB is recommended, with optimizations (Token Merging, chunk_size) allowing operation on as low as 7 GB.
Official Hugging Face demo available: https://huggingface.co/spaces/TencentARC/Text2Video-Zero
Integration with 🧨 Diffusers library: https://huggingface.co/docs/diffusers/api/pipelines/text_to_video_zero

Highlighted Details

Zero-shot text-to-video generation.
Conditional generation with pose, edge, and depth maps.
Video instruction-guided editing (Instruct-Pix2Pix).
Support for arbitrary video lengths and custom Dreambooth models.
Low-memory inference options available.

Maintenance & Community

The project is actively maintained by Picsart AI Research. Community contributions are welcomed, with several external implementations and extensions linked in the README.

Licensing & Compatibility

The code is published under the CreativeML Open RAIL-M license, which is permissive for research and commercial use, with specific clauses regarding responsible AI use.

Limitations & Caveats

The project is an active research implementation. While optimizations exist for lower VRAM, performance may vary. Some features, like background smoothing, require additional components not included in this repository.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days