TokenHMR by saidwivedi

Research paper advancing human mesh recovery via tokenization

Created 1 year ago

331 stars

Top 82.8% on SourcePulse

Project Summary

TokenHMR introduces a novel tokenized pose representation for advancing 3D human mesh recovery, addressing limitations in accuracy of existing methods. It is targeted at researchers and practitioners in computer vision and graphics working on human pose estimation and 3D reconstruction. The method offers improved 3D accuracy by reformulating pose regression as token prediction.

How It Works

TokenHMR employs a two-stage approach: first, an encoder maps continuous poses to discrete pose tokens, creating a "vocabulary" of valid poses. Second, the TokenHMR model uses this tokenized representation for human pose estimation. This tokenization strategy, combined with a Threshold-Adaptive Loss Scaling (TALS) loss, allows the model to learn a more robust and accurate 3D representation without imposing strong prior biases.

Quick Start & Requirements

Installation: Clone the repository, create a Conda environment with Python 3.10, and install PyTorch (tested with 2.1.0/CUDA 11.8) and other dependencies via requirements.txt.
Prerequisites: Python 3.10, PyTorch 2.1.0, CUDA 11.8 (recommended), Detectron2 (for image demos), and a forked version of PHALP (for video demos).
Data: Requires downloading SMPL/SMPLH body models and checkpoints via fetch_demo_data.sh. Training and evaluation require additional datasets (AMASS, MOYO, BEDLAM, 4DHumans, 3DPW, EMDB) which need registration and agreement to licenses.
Links: Website, YouTube, arXiv.

Highlighted Details

Achieves state-of-the-art results on benchmarks like 3DPW and EMDB, with reported MPJPE values as low as 44.8mm.
Offers multiple pre-trained models for different configurations and training iterations.
Includes code for both tokenization and the main TokenHMR model.
Provides demo scripts for running inference on single images and videos.

Maintenance & Community

The project is associated with the Max Planck Institute for Intelligent Systems and ETH Zurich. The latest update was on July 2, 2024, releasing a new model for diverse poses. Contact information for code and commercial licensing is provided.

Licensing & Compatibility

The code is available for non-commercial scientific research purposes. Commercial licensing inquiries should be directed to ps-licensing@tuebingen.mpg.de. Third-party datasets are subject to their respective licenses.

Limitations & Caveats

The project requires specific Python (3.10) and PyTorch versions, and careful setup of CUDA and other dependencies. Downloading and preparing datasets involves registration and agreement to terms, which may be a barrier for quick adoption.

TokenHMR by saidwivedi

Explore Similar Projects

reconstruction-alignment by HorizonWind2004

arctic by zc-alexfan

Outfit-Anyone-in-the-Wild by selfitcamera

1xgpt by 1x-technologies

T2M-GPT by Mael-zys

hififace by maum-ai

cube by Roblox

MS-G3D by kenziyuliu

UniRig by VAST-AI-Research

ViTPose by ViTAE-Transformer

deep-high-resolution-net.pytorch by leoxiaobin

Transformers-Tutorials by NielsRogge