FollowYourPose by mayuelala

Research paper for pose-guided text-to-video generation

Created 2 years ago

1,353 stars

Top 29.5% on SourcePulse

Project Summary

Follow-Your-Pose addresses the generation of text-editable and pose-controllable character videos, a task crucial for digital human creation. It targets researchers and developers in AI-driven video synthesis, offering a method to create dynamic character animations guided by both textual descriptions and skeletal pose sequences.

How It Works

This project employs a novel two-stage training scheme leveraging pre-trained text-to-image models like Stable Diffusion. The first stage uses keypoint-image pairs to train a zero-initialized convolutional encoder for pose information. The second stage fine-tunes the motion by incorporating learnable temporal self-attention and reformed cross-frame self-attention blocks, using pose-free video datasets. This approach allows for continuous pose control while retaining the editing and concept composition capabilities of the base text-to-image model.

Quick Start & Requirements

Install: conda create -n fupose python=3.8, conda activate fupose, pip install -r requirements.txt
Prerequisites: CUDA 11, accelerate, xformers (recommended for A100 GPUs for memory/speed optimization).
Resources: Training was performed on 8 A100 GPUs. Local Gradio demo requires an A100/3090.
Links: Colab Demo, Hugging Face Spaces, Project Page

Highlighted Details

AAAI 2024 accepted paper.
Supports generating videos from raw video or skeleton video input.
Offers a local Gradio demo for interactive use.
Pre-trained checkpoints for Follow-Your-Pose and Stable Diffusion v1-4 are available.

Maintenance & Community

The project is actively maintained as a research codebase with ongoing updates planned. Contact information for key researchers is provided for discussions.

Licensing & Compatibility

The repository does not explicitly state a license. However, it heavily borrows from Tune-A-Video and FateZero, which are typically released under permissive licenses. Users should verify licensing for commercial use.

Limitations & Caveats

The README notes that xformers installation can be unstable. The training environment utilized 8 A100 GPUs, suggesting significant hardware requirements for reproduction. The project is presented as a research codebase, implying potential for ongoing changes and experimental stability.

FollowYourPose by mayuelala

Explore Similar Projects

LAMP by RQ-Wu

HumanSD by IDEA-Research

UniWorld by PKU-YuanGroup

MotionClone by LPengYang

diffusion-self-distillation by primecai

LaViLa by facebookresearch

MiniGPT-5 by eric-ai-lab

TediGAN by IIGROUP

Text2Video by michaelzhang-ai

FateZero by ChenyangQiQi

video-diffusion-pytorch by lucidrains

open_flamingo by mlfoundations