prope by liruilong940607

3D vision transformer positional encoding

Created 8 months ago

653 stars

Top 51.3% on SourcePulse

Project Summary

This repository introduces PRoPE (Cameras as Relative Positional Encoding), a novel method for incorporating 3D geometric relationships between image tokens in multi-view transformers. It addresses the challenge of binding positional information in computer vision tasks, offering a simple and efficient alternative to existing approaches for applications like novel view synthesis.

How It Works

PRoPE leverages relative projective transformations to encode camera parameters, enabling transformers to understand the 3D spatial relationships between image patches. This approach is implemented as a drop-in replacement for standard scaled dot-product attention, directly integrating camera intrinsics and extrinsics into the attention mechanism.

Quick Start & Requirements

Install: Standalone, single-file implementations are available for JAX and PyTorch (prope/jax.py, prope/torch.py).
Prerequisites: PyTorch or JAX, viewmats (world-to-camera matrices), Ks (camera intrinsic matrices), image dimensions, and patch size.
Usage: Replace torch.nn.functional.scaled_dot_product_attention with prope_dot_product_attention, passing camera parameters and image/patch dimensions.
Demo: PyTorch example provided in the README.

Highlighted Details

Improves LVSM performance on Novel View Synthesis tasks.
Aims to improve UniMatch performance on Stereo Depth Estimation.
Offers simple, single-file implementations for JAX and PyTorch.

Maintenance & Community

No specific community channels or roadmap details are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license.

Limitations & Caveats

The README does not detail specific limitations or known issues. Compatibility for commercial use or closed-source linking is not specified due to the lack of a stated license.

prope by liruilong940607

Explore Similar Projects

ml-cubifyanything by apple

Puffin by KangLiao929

DepthLM_Official by facebookresearch

Oryx by Oryx-mllm

LLaVA-3D by ZCMax

StreamVGGT by wzzheng

WonderJourney by KovenYu

acezero by nianticlabs

Unity-PassthroughCameraApiSamples by oculus-samples

QuestCameraKit by xrdevrob

stable-diffusion-webui-depthmap-script by thygate

LLFF by Fyusion