Research paper for pose-guided text-to-video generation
Top 30.9% on sourcepulse
Follow-Your-Pose addresses the generation of text-editable and pose-controllable character videos, a task crucial for digital human creation. It targets researchers and developers in AI-driven video synthesis, offering a method to create dynamic character animations guided by both textual descriptions and skeletal pose sequences.
How It Works
This project employs a novel two-stage training scheme leveraging pre-trained text-to-image models like Stable Diffusion. The first stage uses keypoint-image pairs to train a zero-initialized convolutional encoder for pose information. The second stage fine-tunes the motion by incorporating learnable temporal self-attention and reformed cross-frame self-attention blocks, using pose-free video datasets. This approach allows for continuous pose control while retaining the editing and concept composition capabilities of the base text-to-image model.
Quick Start & Requirements
conda create -n fupose python=3.8
, conda activate fupose
, pip install -r requirements.txt
accelerate
, xformers
(recommended for A100 GPUs for memory/speed optimization).Highlighted Details
Maintenance & Community
The project is actively maintained as a research codebase with ongoing updates planned. Contact information for key researchers is provided for discussions.
Licensing & Compatibility
The repository does not explicitly state a license. However, it heavily borrows from Tune-A-Video and FateZero, which are typically released under permissive licenses. Users should verify licensing for commercial use.
Limitations & Caveats
The README notes that xformers
installation can be unstable. The training environment utilized 8 A100 GPUs, suggesting significant hardware requirements for reproduction. The project is presented as a research codebase, implying potential for ongoing changes and experimental stability.
1 year ago
1 week