CVPR2021 paper for 6D pose estimation via bidirectional RGBD fusion
Top 84.2% on sourcepulse
FFB6D is a PyTorch framework for 6D pose estimation from single RGBD images, targeting researchers and practitioners in robotics and computer vision. It offers a general representation learning approach with a novel bidirectional fusion network, achieving state-of-the-art results on benchmark datasets like LineMOD and YCB-Video.
How It Works
FFB6D employs a full-flow bidirectional fusion strategy, integrating information across encoding and decoding layers of two networks. This allows for leveraging complementary local and global features from each network, leading to richer representations. A key innovation is a 3D keypoint selection algorithm that considers texture and geometry, simplifying precise pose estimation. The framework builds upon PVN3D for keypoint voting and instance semantic segmentation.
Quick Start & Requirements
pip3 install -r requirement.txt
, followed by installing apex
, normalSpeed
, and compiling RandLA-Net operators.Highlighted Details
Maintenance & Community
The project is associated with CVPR2021 Oral and its primary author, Yisheng He. No specific community channels (Discord/Slack) or active maintenance indicators are present in the README.
Licensing & Compatibility
Licensed under the MIT License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
The installation process requires specific CUDA versions (10.1/10.2) and manual compilation of custom operators, which may pose challenges. The framework is primarily demonstrated on LineMOD and YCB-Video datasets, and adapting it to new datasets requires careful data preprocessing and configuration.
2 years ago
1 week