ViTPose by ViTAE-Transformer

PyTorch code for human/animal pose estimation research

Created 3 years ago

1,895 stars

Top 22.7% on SourcePulse

Project Summary

ViTPose provides PyTorch implementations for state-of-the-art human pose estimation using Vision Transformers (ViT). It offers both single-task and multi-task training configurations, achieving high accuracy on benchmarks like MS COCO, OCHuman, and MPII. The project is suitable for researchers and practitioners in computer vision focused on human pose analysis.

How It Works

ViTPose leverages the ViTAE Transformer architecture, a variant of Vision Transformers designed to incorporate inductive biases for improved performance. It explores different ViT backbones (Small, Base, Large, Huge, and the larger ViTAE-G) and decoders (classic and simple) for pose estimation. The models are pre-trained using Masked Autoencoders (MAE) and fine-tuned on various human and animal pose datasets.

Quick Start & Requirements

Installation: Requires cloning both mmcv (specifically v1.3.9) and the ViTPose repository. Installation involves pip install -e . for both, followed by pip install timm==0.4.9 einops.
Prerequisites: PyTorch 1.9.0 or NGC docker 21.06, mmcv 1.3.9.
Usage: Training and testing scripts are provided. Pre-trained models are available via OneDrive links.
Demo: A web demo is integrated into Huggingface Spaces using Gradio.
Docs: Configuration files and logs are linked for detailed usage.

Highlighted Details

Achieves 81.1 AP on MS COCO Keypoint test-dev set with ViTPose-G.
Offers models pre-trained with MAE, with weights for various sizes available.
Supports multi-task learning across human, animal, and whole-body pose estimation.
Provides comprehensive benchmark results on multiple datasets (MS COCO, OCHuman, MPII, CrowdPose, AP10K, APT36K, WholeBody, InterHand2.6M).

Maintenance & Community

The project is associated with the authors of the ViTPose and ViTAE papers. Updates are provided, with the latest mentioning MoE strategies for joint pose estimation tasks. Links to relevant papers and citations are included.

Licensing & Compatibility

The repository is released under an unspecified license. However, it acknowledges implementations from mmpose and MAE, which may have their own licensing terms. Compatibility for commercial use or closed-source linking is not explicitly stated.

Limitations & Caveats

The README mentions potential duplicate images in the CrowdPose training set affecting evaluation. Some pre-trained weights for specific tasks (e.g., InterHand2.6M) are listed as "Coming Soon." The exact licensing for commercial use is not detailed.

Health Check

Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

38 stars in the last 30 days