PyTorch code for human/animal pose estimation research
Top 25.6% on sourcepulse
ViTPose provides PyTorch implementations for state-of-the-art human pose estimation using Vision Transformers (ViT). It offers both single-task and multi-task training configurations, achieving high accuracy on benchmarks like MS COCO, OCHuman, and MPII. The project is suitable for researchers and practitioners in computer vision focused on human pose analysis.
How It Works
ViTPose leverages the ViTAE Transformer architecture, a variant of Vision Transformers designed to incorporate inductive biases for improved performance. It explores different ViT backbones (Small, Base, Large, Huge, and the larger ViTAE-G) and decoders (classic and simple) for pose estimation. The models are pre-trained using Masked Autoencoders (MAE) and fine-tuned on various human and animal pose datasets.
Quick Start & Requirements
mmcv
(specifically v1.3.9) and the ViTPose
repository. Installation involves pip install -e .
for both, followed by pip install timm==0.4.9 einops
.mmcv
1.3.9.Highlighted Details
Maintenance & Community
The project is associated with the authors of the ViTPose and ViTAE papers. Updates are provided, with the latest mentioning MoE strategies for joint pose estimation tasks. Links to relevant papers and citations are included.
Licensing & Compatibility
The repository is released under an unspecified license. However, it acknowledges implementations from mmpose
and MAE, which may have their own licensing terms. Compatibility for commercial use or closed-source linking is not explicitly stated.
Limitations & Caveats
The README mentions potential duplicate images in the CrowdPose training set affecting evaluation. Some pre-trained weights for specific tasks (e.g., InterHand2.6M) are listed as "Coming Soon." The exact licensing for commercial use is not detailed.
1 year ago
1 week