FollowYourPose  by mayuelala

Research paper for pose-guided text-to-video generation

Created 2 years ago
1,338 stars

Top 30.0% on SourcePulse

GitHubView on GitHub
Project Summary

Follow-Your-Pose addresses the generation of text-editable and pose-controllable character videos, a task crucial for digital human creation. It targets researchers and developers in AI-driven video synthesis, offering a method to create dynamic character animations guided by both textual descriptions and skeletal pose sequences.

How It Works

This project employs a novel two-stage training scheme leveraging pre-trained text-to-image models like Stable Diffusion. The first stage uses keypoint-image pairs to train a zero-initialized convolutional encoder for pose information. The second stage fine-tunes the motion by incorporating learnable temporal self-attention and reformed cross-frame self-attention blocks, using pose-free video datasets. This approach allows for continuous pose control while retaining the editing and concept composition capabilities of the base text-to-image model.

Quick Start & Requirements

  • Install: conda create -n fupose python=3.8, conda activate fupose, pip install -r requirements.txt
  • Prerequisites: CUDA 11, accelerate, xformers (recommended for A100 GPUs for memory/speed optimization).
  • Resources: Training was performed on 8 A100 GPUs. Local Gradio demo requires an A100/3090.
  • Links: Colab Demo, Hugging Face Spaces, Project Page

Highlighted Details

  • AAAI 2024 accepted paper.
  • Supports generating videos from raw video or skeleton video input.
  • Offers a local Gradio demo for interactive use.
  • Pre-trained checkpoints for Follow-Your-Pose and Stable Diffusion v1-4 are available.

Maintenance & Community

The project is actively maintained as a research codebase with ongoing updates planned. Contact information for key researchers is provided for discussions.

Licensing & Compatibility

The repository does not explicitly state a license. However, it heavily borrows from Tune-A-Video and FateZero, which are typically released under permissive licenses. Users should verify licensing for commercial use.

Limitations & Caveats

The README notes that xformers installation can be unstable. The training environment utilized 8 A100 GPUs, suggesting significant hardware requirements for reproduction. The project is presented as a research codebase, implying potential for ongoing changes and experimental stability.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
10 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.