3D human pose estimator from 2D joint estimations in RGB images
Top 41.2% on sourcepulse
MocapNET is a real-time system for estimating 3D human pose from monocular RGB images, outputting results in the BVH format. It is designed for researchers and developers in computer vision and animation who need accurate and efficient human motion capture. The system offers a 33% accuracy improvement on the Human 3.6 Million dataset compared to its baseline while maintaining real-time performance.
How It Works
MocapNET employs a novel, compact 2D pose representation (NSRM) and an ensemble of orientation-tuned neural networks. It decomposes the human body into upper and lower kinematic hierarchies, enabling pose recovery even with significant occlusions. An efficient Inverse Kinematics solver further refines the neural network's output, ensuring pose consistency with known limb sizes for personalized tracking.
Quick Start & Requirements
./initialize.sh
.initialize.sh
script handles dependency downloads and model setup.Highlighted Details
Maintenance & Community
The project's primary developer recently defended their PhD thesis on MocapNET, with plans for continued development and funding. The codebase was rewritten in Python for MocapNET v4, with a dedicated mnet4
branch.
Licensing & Compatibility
This library is provided under the FORTH license.
Limitations & Caveats
The project has experienced periods of reduced maintenance due to the developer's PhD commitments. While the initialize.sh
script automates setup, GPU compatibility issues with specific Tensorflow C-API builds may require manual configuration or recompilation of the Tensorflow C-API. The bundled 2D joint detectors are faster but less accurate than full OpenPose implementations.
3 days ago
1 week