FollowYourPose  by mayuelala

Research paper for pose-guided text-to-video generation

created 2 years ago
1,328 stars

Top 30.9% on sourcepulse

GitHubView on GitHub
Project Summary

Follow-Your-Pose addresses the generation of text-editable and pose-controllable character videos, a task crucial for digital human creation. It targets researchers and developers in AI-driven video synthesis, offering a method to create dynamic character animations guided by both textual descriptions and skeletal pose sequences.

How It Works

This project employs a novel two-stage training scheme leveraging pre-trained text-to-image models like Stable Diffusion. The first stage uses keypoint-image pairs to train a zero-initialized convolutional encoder for pose information. The second stage fine-tunes the motion by incorporating learnable temporal self-attention and reformed cross-frame self-attention blocks, using pose-free video datasets. This approach allows for continuous pose control while retaining the editing and concept composition capabilities of the base text-to-image model.

Quick Start & Requirements

  • Install: conda create -n fupose python=3.8, conda activate fupose, pip install -r requirements.txt
  • Prerequisites: CUDA 11, accelerate, xformers (recommended for A100 GPUs for memory/speed optimization).
  • Resources: Training was performed on 8 A100 GPUs. Local Gradio demo requires an A100/3090.
  • Links: Colab Demo, Hugging Face Spaces, Project Page

Highlighted Details

  • AAAI 2024 accepted paper.
  • Supports generating videos from raw video or skeleton video input.
  • Offers a local Gradio demo for interactive use.
  • Pre-trained checkpoints for Follow-Your-Pose and Stable Diffusion v1-4 are available.

Maintenance & Community

The project is actively maintained as a research codebase with ongoing updates planned. Contact information for key researchers is provided for discussions.

Licensing & Compatibility

The repository does not explicitly state a license. However, it heavily borrows from Tune-A-Video and FateZero, which are typically released under permissive licenses. Users should verify licensing for commercial use.

Limitations & Caveats

The README notes that xformers installation can be unstable. The training environment utilized 8 A100 GPUs, suggesting significant hardware requirements for reproduction. The project is presented as a research codebase, implying potential for ongoing changes and experimental stability.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.