Discover and explore top open-source AI tools and projects—updated daily.
google-deepmindState-of-the-art point tracking for video and robotics
Top 24.7% on SourcePulse
Summary
Google DeepMind's TAPNet repository addresses the challenge of Tracking Any Point (TAP) in videos, offering state-of-the-art models like TAPIR and TAPNext. It provides datasets and benchmarks (TAP-Vid, TAPVid-3D, RoboTAP) crucial for advancing computer vision and robotics research, enabling precise point tracking across diverse scenarios.
How It Works
TAPIR employs a two-stage approach: frame-wise matching followed by temporal refinement, achieving high speed and accuracy. TAPNext reformulates the problem as next token prediction for a simpler, efficient tracker. BootsTAP enhances performance through self-supervised learning on unlabeled data, improving robustness to transformations and query variations.
Quick Start & Requirements
Installation is straightforward via pip install .. Key dependencies include JAX, with CUDA/cuDNN recommended for GPU acceleration. Colab notebooks offer immediate online demos. For local execution, clone the repository, install requirements, download checkpoints, and run the provided live demo script. Performance benchmarks indicate ~17 FPS on a Quadro RTX 4000 for 480x480 images.
Highlighted Details
Maintenance & Community
Developed by Google DeepMind, the project is backed by multiple research publications, indicating active development and validation. No specific community channels (e.g., Discord, Slack) are listed in the README.
Licensing & Compatibility
Core software is licensed under Apache 2.0, permitting commercial use. TAPVid-3D has a separate license, while dataset annotations and videos (TAP-Vid, RoboTAP) are under CC-BY 4.0. Source videos (DAVIS, Kinetics) retain their original licenses. Users should verify compatibility for specific video sources.
Limitations & Caveats
The README does not detail explicit limitations, alpha status, or known bugs. Careful attention is required for coordinate system conventions (normalized raster vs. regular raster, (x, y) vs. (t, y, x)). Optimal performance and accuracy may depend on specific hardware and higher resolutions, potentially increasing computational demands.
2 weeks ago
Inactive
 Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), 
google
grahamjenson
google-research
triton-inference-server
tensorflow