FoundationPose  by NVlabs

CVPR 2024 paper for unified 6D pose estimation/tracking of novel objects

created 1 year ago
2,325 stars

Top 20.1% on sourcepulse

GitHubView on GitHub
Project Summary

FoundationPose offers a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free scenarios. It targets researchers and engineers in robotics and AR/VR who need to estimate object poses without object-specific fine-tuning, achieving state-of-the-art results on challenging benchmarks.

How It Works

The approach unifies model-based (CAD model required) and model-free (few reference images) setups using a neural implicit representation for novel view synthesis. This allows pose estimation modules to remain invariant across both setups. Strong generalizability is achieved through large-scale synthetic training, a transformer-based architecture, contrastive learning, and LLM-aided data generation.

Quick Start & Requirements

  • Installation: Docker is recommended (docker pull wenbowen123/foundationpose). For CUDA 12.1 support, use shingarey/foundationpose_custom_cuda121:latest. Conda installation is experimental.
  • Prerequisites: Python 3.9+, Eigen 3.4.0, NVDiffRast, PyTorch 2.0.0+ (with CUDA), Kaolin (for model-free).
  • Data: Requires downloading network weights, demo data, and optionally large-scale training data and preprocessed reference views.
  • Demo: python run_demo.py
  • Datasets: python run_linemod.py and python run_ycb_video.py for model-based; bundlesdf/run_nerf.py followed by run_ycb_video.py for model-free.
  • Links: Paper, Website, ROS Version, Demos

Highlighted Details

  • Achieved No. 1 on the world-wide BOP leaderboard for model-based novel object pose estimation (as of March 2024).
  • Outperforms specialized methods by a large margin and achieves comparable results to instance-level methods.
  • Supports instant application to novel objects without fine-tuning.
  • Leverages LLMs for synthetic data generation.

Maintenance & Community

  • Developed by NVlabs with contributors including Bowen Wen, Wei Yang, Jan Kautz, and Stan Birchfield.
  • Mentions NVIDIA Isaac Sim and Omniverse for support.
  • Contact: Bowen Wen.

Licensing & Compatibility

  • NVIDIA Source Code License.
  • Copyright © 2024, NVIDIA Corporation. All rights reserved.
  • Restrictions: Not explicitly stated, but NVIDIA Source Code License may have commercial use limitations.

Limitations & Caveats

  • The release does not include diffusion-based texture augmented data or weights due to legal restrictions, potentially leading to slight performance degradation.
  • Troubleshooting section points to potential issues with newer GPUs (e.g., 4090) and Windows setup.
Health Check
Last commit

5 months ago

Responsiveness

1 week

Pull Requests (30d)
2
Issues (30d)
11
Star History
299 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.