PIA by open-mmlab

Image animator for personalized video generation via text prompts

Created 2 years ago

976 stars

Top 37.8% on SourcePulse

Project Summary

PIA is a personalized image animation method that generates videos from static images using text prompts, offering high motion controllability and strong text-image alignment. It is designed for researchers and practitioners in computer vision and generative AI, enabling the creation of custom animated content with fine-grained control over motion and style.

How It Works

PIA leverages a plug-and-play module approach within text-to-image models, integrating techniques like Dreambooth for personalization. This allows users to animate existing images by providing text prompts that guide the motion and content of the generated video, achieving a balance between user-defined control and the generative capabilities of diffusion models.

Quick Start & Requirements

Installation: Use conda env create -f pia.yml and conda activate pia. An alternative environment.yaml is available for PyTorch 1.13.1.
Prerequisites: PyTorch 2.0.0 is recommended for scaled_dot_product_attention support. Requires git-lfs for downloading checkpoints.
Checkpoints: Download Stable Diffusion v1-5, PIA checkpoints, and personalized models (RealisticVision, RcnzCartoon, MajicMix) from HuggingFace or Google Drive.
Demo: Available via HuggingFace, OpenXLab, and Colab.
Documentation: https://github.com/open-mmlab/PIA

Highlighted Details

Supports 1024x1024 image animation with 16GB GPU memory using scaled_dot_product_attention.
Offers control over motion magnitude via a magnitude parameter.
Enables style transfer by specifying a base model and using the --style_transfer flag.
Supports generating loopable videos with the --loop flag.

Maintenance & Community

The project is associated with OpenMMLab and is built upon AnimateDiff, Tune-a-Video, and PySceneDetect. Contact information for key contributors is provided.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as a CVPR 2024 paper, suggesting it may be research-oriented. Specific limitations or known issues are not detailed in the README.

PIA by open-mmlab

Explore Similar Projects

OmniGen2 by VectorSpaceLab

Comfyui_Comfly by ainewsto

ComfyUI-OmniGen by 1038lab

UltraPixel by catcathh

Semi-Auto-NovelAI-to-Pixiv by zhulinyv

kandinsky-5 by kandinskylab

clip-guided-diffusion by afiaka87

UNO by bytedance

HunyuanVideo-I2V by Tencent-Hunyuan

deep-daze by lucidrains

OmniGen by VectorSpaceLab

imaginAIry by brycedrennan