Discover and explore top open-source AI tools and projects—updated daily.
3D vision transformer positional encoding
Top 56.5% on SourcePulse
This repository introduces PRoPE (Cameras as Relative Positional Encoding), a novel method for incorporating 3D geometric relationships between image tokens in multi-view transformers. It addresses the challenge of binding positional information in computer vision tasks, offering a simple and efficient alternative to existing approaches for applications like novel view synthesis.
How It Works
PRoPE leverages relative projective transformations to encode camera parameters, enabling transformers to understand the 3D spatial relationships between image patches. This approach is implemented as a drop-in replacement for standard scaled dot-product attention, directly integrating camera intrinsics and extrinsics into the attention mechanism.
Quick Start & Requirements
prope/jax.py
, prope/torch.py
).viewmats
(world-to-camera matrices), Ks
(camera intrinsic matrices), image dimensions, and patch size.torch.nn.functional.scaled_dot_product_attention
with prope_dot_product_attention
, passing camera parameters and image/patch dimensions.Highlighted Details
Maintenance & Community
No specific community channels or roadmap details are provided in the README.
Licensing & Compatibility
The repository does not explicitly state a license.
Limitations & Caveats
The README does not detail specific limitations or known issues. Compatibility for commercial use or closed-source linking is not specified due to the lack of a stated license.
3 days ago
Inactive