ViTPose  by ViTAE-Transformer

PyTorch code for human/animal pose estimation research

created 3 years ago
1,694 stars

Top 25.6% on sourcepulse

GitHubView on GitHub
Project Summary

ViTPose provides PyTorch implementations for state-of-the-art human pose estimation using Vision Transformers (ViT). It offers both single-task and multi-task training configurations, achieving high accuracy on benchmarks like MS COCO, OCHuman, and MPII. The project is suitable for researchers and practitioners in computer vision focused on human pose analysis.

How It Works

ViTPose leverages the ViTAE Transformer architecture, a variant of Vision Transformers designed to incorporate inductive biases for improved performance. It explores different ViT backbones (Small, Base, Large, Huge, and the larger ViTAE-G) and decoders (classic and simple) for pose estimation. The models are pre-trained using Masked Autoencoders (MAE) and fine-tuned on various human and animal pose datasets.

Quick Start & Requirements

  • Installation: Requires cloning both mmcv (specifically v1.3.9) and the ViTPose repository. Installation involves pip install -e . for both, followed by pip install timm==0.4.9 einops.
  • Prerequisites: PyTorch 1.9.0 or NGC docker 21.06, mmcv 1.3.9.
  • Usage: Training and testing scripts are provided. Pre-trained models are available via OneDrive links.
  • Demo: A web demo is integrated into Huggingface Spaces using Gradio.
  • Docs: Configuration files and logs are linked for detailed usage.

Highlighted Details

  • Achieves 81.1 AP on MS COCO Keypoint test-dev set with ViTPose-G.
  • Offers models pre-trained with MAE, with weights for various sizes available.
  • Supports multi-task learning across human, animal, and whole-body pose estimation.
  • Provides comprehensive benchmark results on multiple datasets (MS COCO, OCHuman, MPII, CrowdPose, AP10K, APT36K, WholeBody, InterHand2.6M).

Maintenance & Community

The project is associated with the authors of the ViTPose and ViTAE papers. Updates are provided, with the latest mentioning MoE strategies for joint pose estimation tasks. Links to relevant papers and citations are included.

Licensing & Compatibility

The repository is released under an unspecified license. However, it acknowledges implementations from mmpose and MAE, which may have their own licensing terms. Compatibility for commercial use or closed-source linking is not explicitly stated.

Limitations & Caveats

The README mentions potential duplicate images in the CrowdPose training set affecting evaluation. Some pre-trained weights for specific tasks (e.g., InterHand2.6M) are listed as "Coming Soon." The exact licensing for commercial use is not detailed.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
111 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
12 more.

pytorch-image-models by huggingface

0.2%
35k
PyTorch image model collection with training, eval, and inference scripts
created 6 years ago
updated 1 day ago
Feedback? Help us improve.