PIA  by open-mmlab

Image animator for personalized video generation via text prompts

created 1 year ago
969 stars

Top 38.8% on sourcepulse

GitHubView on GitHub
Project Summary

PIA is a personalized image animation method that generates videos from static images using text prompts, offering high motion controllability and strong text-image alignment. It is designed for researchers and practitioners in computer vision and generative AI, enabling the creation of custom animated content with fine-grained control over motion and style.

How It Works

PIA leverages a plug-and-play module approach within text-to-image models, integrating techniques like Dreambooth for personalization. This allows users to animate existing images by providing text prompts that guide the motion and content of the generated video, achieving a balance between user-defined control and the generative capabilities of diffusion models.

Quick Start & Requirements

  • Installation: Use conda env create -f pia.yml and conda activate pia. An alternative environment.yaml is available for PyTorch 1.13.1.
  • Prerequisites: PyTorch 2.0.0 is recommended for scaled_dot_product_attention support. Requires git-lfs for downloading checkpoints.
  • Checkpoints: Download Stable Diffusion v1-5, PIA checkpoints, and personalized models (RealisticVision, RcnzCartoon, MajicMix) from HuggingFace or Google Drive.
  • Demo: Available via HuggingFace, OpenXLab, and Colab.
  • Documentation: https://github.com/open-mmlab/PIA

Highlighted Details

  • Supports 1024x1024 image animation with 16GB GPU memory using scaled_dot_product_attention.
  • Offers control over motion magnitude via a magnitude parameter.
  • Enables style transfer by specifying a base model and using the --style_transfer flag.
  • Supports generating loopable videos with the --loop flag.

Maintenance & Community

The project is associated with OpenMMLab and is built upon AnimateDiff, Tune-a-Video, and PySceneDetect. Contact information for key contributors is provided.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as a CVPR 2024 paper, suggesting it may be research-oriented. Specific limitations or known issues are not detailed in the README.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.