FFB6D  by ethnhe

CVPR2021 paper for 6D pose estimation via bidirectional RGBD fusion

created 4 years ago
329 stars

Top 84.2% on sourcepulse

GitHubView on GitHub
Project Summary

FFB6D is a PyTorch framework for 6D pose estimation from single RGBD images, targeting researchers and practitioners in robotics and computer vision. It offers a general representation learning approach with a novel bidirectional fusion network, achieving state-of-the-art results on benchmark datasets like LineMOD and YCB-Video.

How It Works

FFB6D employs a full-flow bidirectional fusion strategy, integrating information across encoding and decoding layers of two networks. This allows for leveraging complementary local and global features from each network, leading to richer representations. A key innovation is a 3D keypoint selection algorithm that considers texture and geometry, simplifying precise pose estimation. The framework builds upon PVN3D for keypoint voting and instance semantic segmentation.

Quick Start & Requirements

  • Install: pip3 install -r requirement.txt, followed by installing apex, normalSpeed, and compiling RandLA-Net operators.
  • Prerequisites: CUDA 10.1/10.2, Python 3, PyTorch. Requires specific dataset preparation (LineMOD, YCB-Video).
  • Links: Arxiv, Video, Demo Video

Highlighted Details

  • Achieves 75ms inference time per frame on a single 2080Ti GPU, with 33.8M parameters.
  • Outperforms prior methods like PVN3D and DenseFusion on YCB-Video and LineMOD datasets.
  • Includes comprehensive scripts for training, evaluation, and visualization on both LineMOD and YCB-Video datasets.
  • Provides tools for dataset preparation, including keypoint generation and mesh processing.

Maintenance & Community

The project is associated with CVPR2021 Oral and its primary author, Yisheng He. No specific community channels (Discord/Slack) or active maintenance indicators are present in the README.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The installation process requires specific CUDA versions (10.1/10.2) and manual compilation of custom operators, which may pose challenges. The framework is primarily demonstrated on LineMOD and YCB-Video datasets, and adapting it to new datasets requires careful data preprocessing and configuration.

Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
13 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.