FFB6D by ethnhe

CVPR2021 paper for 6D pose estimation via bidirectional RGBD fusion

Created 5 years ago

342 stars

Top 81.1% on SourcePulse

Project Summary

FFB6D is a PyTorch framework for 6D pose estimation from single RGBD images, targeting researchers and practitioners in robotics and computer vision. It offers a general representation learning approach with a novel bidirectional fusion network, achieving state-of-the-art results on benchmark datasets like LineMOD and YCB-Video.

How It Works

FFB6D employs a full-flow bidirectional fusion strategy, integrating information across encoding and decoding layers of two networks. This allows for leveraging complementary local and global features from each network, leading to richer representations. A key innovation is a 3D keypoint selection algorithm that considers texture and geometry, simplifying precise pose estimation. The framework builds upon PVN3D for keypoint voting and instance semantic segmentation.

Quick Start & Requirements

Install: pip3 install -r requirement.txt, followed by installing apex, normalSpeed, and compiling RandLA-Net operators.
Prerequisites: CUDA 10.1/10.2, Python 3, PyTorch. Requires specific dataset preparation (LineMOD, YCB-Video).
Links: Arxiv, Video, Demo Video

Highlighted Details

Achieves 75ms inference time per frame on a single 2080Ti GPU, with 33.8M parameters.
Outperforms prior methods like PVN3D and DenseFusion on YCB-Video and LineMOD datasets.
Includes comprehensive scripts for training, evaluation, and visualization on both LineMOD and YCB-Video datasets.
Provides tools for dataset preparation, including keypoint generation and mesh processing.

Maintenance & Community

The project is associated with CVPR2021 Oral and its primary author, Yisheng He. No specific community channels (Discord/Slack) or active maintenance indicators are present in the README.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The installation process requires specific CUDA versions (10.1/10.2) and manual compilation of custom operators, which may pose challenges. The framework is primarily demonstrated on LineMOD and YCB-Video datasets, and adapting it to new datasets requires careful data preprocessing and configuration.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days