MS-G3D by kenziyuliu

PyTorch code for skeleton-based action recognition research paper

Created 5 years ago

454 stars

Top 66.4% on SourcePulse

Project Summary

This repository provides a PyTorch implementation for skeleton-based action recognition, addressing the need for effective graph convolutional networks. It targets researchers and practitioners in computer vision and human-computer interaction, offering a unified framework for disentangling graph convolution operations to improve performance.

How It Works

The project implements a novel approach to graph convolutions for skeleton-based action recognition, disentangling spatial and temporal graph learning. This allows for more flexible and powerful modeling of human actions by separating the learning of relationships between body joints (spatial) from the learning of how these relationships evolve over time (temporal). This disentanglement is key to achieving state-of-the-art results.

Quick Start & Requirements

Install: pip install -r requirements.txt (after cloning)
Prerequisites: Python >= 3.6, PyTorch >= 1.2.0, NVIDIA Apex, PyYAML, tqdm, tensorboardX.
Data: Requires downloading NTU RGB+D 60/120 and Kinetics 400 datasets, totaling ~38GB-77GB after preprocessing. Data generation can take several hours.
Resources: Mixed precision training (--half) is recommended for GPUs with ~11GB memory.
Links: PDF, Demo

Highlighted Details

CVPR 2020 Oral paper.
Supports joint-bone two-stream fusion for enhanced accuracy.
Offers pretrained models for NTU RGB+D 60/120 and Kinetics 400.
Mixed precision training with NVIDIA Apex is supported for memory efficiency.

Maintenance & Community

Based on 2s-AGCN and ST-GCN projects.
Contact: kenziyuliu AT outlook.com

Licensing & Compatibility

The README does not explicitly state a license. However, the project is based on other repositories which may have their own licenses. Users should verify licensing for commercial use.

Limitations & Caveats

Memory usage can be sensitive to PyTorch/CUDA versions and GPU setups, potentially leading to OOM errors. Using Apex O2 mode may require single-GPU training due to nn.DataParallel incompatibility. The best fusion results might not always come from the top-performing individual stream models.

MS-G3D by kenziyuliu

Explore Similar Projects

GPT4Scene-and-VLN-R1 by Qi-Zhangyang

TokenHMR by saidwivedi

arctic by zc-alexfan

torch-conv-kan by IvanDrokin

PIDM by ankanbhunia

DenseMatching by PruneTruong

nnDetection by MIC-DKFZ

yolov5-pytorch by bubbliiiing

yolox-pytorch by bubbliiiing

X-AnyLabeling by CVHub520

deep-high-resolution-net.pytorch by leoxiaobin

Pytorch-UNet by milesial