TokenHMR  by saidwivedi

Research paper advancing human mesh recovery via tokenization

created 1 year ago
315 stars

Top 86.9% on sourcepulse

GitHubView on GitHub
Project Summary

TokenHMR introduces a novel tokenized pose representation for advancing 3D human mesh recovery, addressing limitations in accuracy of existing methods. It is targeted at researchers and practitioners in computer vision and graphics working on human pose estimation and 3D reconstruction. The method offers improved 3D accuracy by reformulating pose regression as token prediction.

How It Works

TokenHMR employs a two-stage approach: first, an encoder maps continuous poses to discrete pose tokens, creating a "vocabulary" of valid poses. Second, the TokenHMR model uses this tokenized representation for human pose estimation. This tokenization strategy, combined with a Threshold-Adaptive Loss Scaling (TALS) loss, allows the model to learn a more robust and accurate 3D representation without imposing strong prior biases.

Quick Start & Requirements

  • Installation: Clone the repository, create a Conda environment with Python 3.10, and install PyTorch (tested with 2.1.0/CUDA 11.8) and other dependencies via requirements.txt.
  • Prerequisites: Python 3.10, PyTorch 2.1.0, CUDA 11.8 (recommended), Detectron2 (for image demos), and a forked version of PHALP (for video demos).
  • Data: Requires downloading SMPL/SMPLH body models and checkpoints via fetch_demo_data.sh. Training and evaluation require additional datasets (AMASS, MOYO, BEDLAM, 4DHumans, 3DPW, EMDB) which need registration and agreement to licenses.
  • Links: Website, YouTube, arXiv.

Highlighted Details

  • Achieves state-of-the-art results on benchmarks like 3DPW and EMDB, with reported MPJPE values as low as 44.8mm.
  • Offers multiple pre-trained models for different configurations and training iterations.
  • Includes code for both tokenization and the main TokenHMR model.
  • Provides demo scripts for running inference on single images and videos.

Maintenance & Community

The project is associated with the Max Planck Institute for Intelligent Systems and ETH Zurich. The latest update was on July 2, 2024, releasing a new model for diverse poses. Contact information for code and commercial licensing is provided.

Licensing & Compatibility

The code is available for non-commercial scientific research purposes. Commercial licensing inquiries should be directed to ps-licensing@tuebingen.mpg.de. Third-party datasets are subject to their respective licenses.

Limitations & Caveats

The project requires specific Python (3.10) and PyTorch versions, and careful setup of CUDA and other dependencies. Downloading and preparing datasets involves registration and agreement to terms, which may be a barrier for quick adoption.

Health Check
Last commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
27 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.