STream3R by NIRVANALAN

Scalable sequential 3D reconstruction with causal transformers

Created 4 months ago

284 stars

Top 92.2% on SourcePulse

Project Summary

STream3R addresses dense 3D reconstruction by reformulating it as a sequential registration task using causal Transformers. It offers an efficient, streaming framework for processing image sequences, enabling generalization to dynamic scenes and leveraging LLM-style training infrastructure. This benefits researchers and practitioners in real-time 3D perception, robotics, and autonomous systems.

How It Works

STream3R employs a decoder-only Transformer architecture with causal attention to process image sequences efficiently. This streaming approach, inspired by language modeling, avoids expensive global optimization and scales better with sequence length than simplistic memory mechanisms. It supports advanced attention variants like FlashAttention and KV Cache, offering advantages in handling dynamic scenes and enabling large-scale pretraining.

Quick Start & Requirements

Installation involves cloning the repository, creating a Conda environment with Python 3.11 and CMake 3.14.0, installing PyTorch (CUDA version dependent, e.g., cu126), other Python dependencies via requirements.txt, and the package itself (pip install -e .). Inference code and pre-trained weights are available on Hugging Face.

Highlighted Details

Supports advanced attention mechanisms: FlashAttention, KV Cache, Causal Attention, Sliding Window Attention, and Full Attention.
Achieves state-of-the-art results, outperforming prior work on static and dynamic scene benchmarks, with specific metrics like 0.057 Acc Mean and 0.993 NC Med. on NRGBD.
Provides detailed GPU memory usage and runtime benchmarks, demonstrating efficient scaling with sequence length, particularly for Window attention.
Compatible with LLM-style training infrastructure for efficient large-scale pretraining and fine-tuning.

Maintenance & Community

The project is led by researchers from Nanyang Technological University, Shanghai Artificial Intelligence Laboratory, Peking University, and The University of Hong Kong. Contact is available via email (lanyushi15@gmail.com) or GitHub issues. A "metric-scale version" is listed as a future TODO.

Licensing & Compatibility

The project is licensed under the "NTU S-Lab License 1.0". Redistribution and use must adhere to this license, which may impose specific restrictions on commercial use or derivative works.

Limitations & Caveats

A "metric-scale version" of the reconstruction is still under development and not yet released. The installation of PyTorch requires careful selection based on the user's CUDA version.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days