PoseGPT by yfeng95

Multimodal LLM for 3D human pose understanding and reasoning

Created 2 years ago

290 stars

Top 91.0% on SourcePulse

Project Summary

ChatPose is a multi-modal large language model designed for understanding and reasoning about 3D human poses in SMPL format. It allows users to query and infer human poses from both image and text inputs, targeting researchers and developers in computer vision and human pose estimation.

How It Works

ChatPose integrates a large language model with specialized components for processing 3D human pose data. It leverages existing multi-modal architectures like LLaVA and LISA, adapting them to understand and generate SMPL pose representations, enabling conversational interaction about human movements.

Quick Start & Requirements

Install via bash install_conda.sh.
Requires downloading data via bash fetch_data.sh for SMPL-X models.
Inference: python main_chat.py for text-based chat, python main_chat.py --image_file dataset/baber.png for image input.

Highlighted Details

Multi-modal LLM for 3D human pose understanding and reasoning.
Supports querying and inferring poses from images and text.
Built upon LLaVA and LISA architectures.
References related work in TokenHMR, PoseScript, and 4D-Humans.

Maintenance & Community

The project is associated with Yao Feng, Jing Lin, Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, and Michael J. Black. It acknowledges contributions from LLaVA and LISA projects.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as an implementation of ChatPose (formerly PoseGPT) and may be under active development. Specific limitations or unsupported features are not detailed in the provided README.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days