FoundationPose-plus-plus by teal024

Real-time 6D object pose tracking for dynamic scenes

Created 1 year ago

291 stars

Top 90.6% on SourcePulse

Project Summary

Summary

FoundationPose++ enhances FoundationPose's capability for real-time 6D object pose tracking in high-dynamic scenes. By integrating standard 2D trackers, direct depth estimation, and a Kalman Filter, it provides a robust solution for applications requiring precise object localization in complex, moving environments. This augmentation targets robotics engineers and researchers seeking improved performance and reliability over existing methods.

How It Works

This project enhances FoundationPose by replacing its 'pseudo-tracking' with a more robust, engineering-driven approach. For the 6 degrees of freedom, it leverages common 2D trackers (like OSTrack) for X and Y coordinates, directly uses depth sensor data for the Z axis, and employs a Kalman Filter to track orientation (roll, pitch, yaw). This modular integration ensures that the core FoundationPose refinement network is efficiently utilized for tracking, maintaining real-time performance. An optional amodal completion module is also included to bolster performance under occlusion.

Quick Start & Requirements

Installation requires following instructions in install.md. Data preparation involves organizing RGB, depth, and mesh files per frame. A demo for lego_20fps is available via Google Drive: https://drive.google.com/file/d/1oN5IZHKlb06hEol6akwx1ibCiVcJBuuI/view?usp=sharing, runnable with a provided Python script (src/obj_pose_track.py) after setting environment variables and paths. Inference on custom data necessitates obtaining initial object masks, potentially using Qwen-VL for bounding boxes and SAM-HQ for masks. Tested on Nvidia RTX4090/H800 with Ubuntu.

Highlighted Details

Achieves real-time performance, with the core tracking loop running efficiently and 2D trackers like OSTrack capable of exceeding 100 FPS.
Employs a practical, engineering-focused methodology combining established computer vision and filtering techniques.
Includes an amodal completion module aimed at improving robustness against object occlusion.
Released as a public preview on March 12, 2025.

Maintenance & Community

The project was released as a public preview on March 2025. Specific details on active maintenance, core contributors, or community channels (like Discord/Slack) are not explicitly provided in the README, though a link to "RedNote Motivation" is mentioned for discussion.

Licensing & Compatibility

License information is not specified. This omission requires clarification for assessing commercial use or integration into closed-source projects.

Limitations & Caveats

The amodal completion module is noted as potentially slow and having compatibility issues, with ongoing optimization efforts. Additionally, the license is not specified, posing a potential adoption blocker.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

16 stars in the last 30 days