LIRA by zhen6618

Multimodal LLM for implicit instruction-guided 3D reconstruction

Created 11 months ago

318 stars

Top 85.4% on SourcePulse

Project Summary

Summary:

LIRA tackles 3D reconstruction from complex, implicit language instructions, overcoming limitations of systems relying on explicit commands. It targets computer vision and multimodal AI researchers, offering an LLM-driven system for real-time scene reconstruction with improved accuracy.

How It Works:

LIRA enables incremental 3D reconstruction from RGB-D sequences and complex instructions. It utilizes a multimodal LLM to interpret implicit commands, identify 2D instances, and project them into a 3D map. A novel Text-enhanced Instance Fusion (TIFF) module enhances instance fusion quality by processing multiple keyframes simultaneously within Fragment bounding volumes. The system introduces the ReasonRecon benchmark for evaluating this task.

Quick Start & Requirements:

Installation: Requires Python 3.9, PyTorch 2.0.0 (CUDA 11.7), mmcv-full, sparsehash, and deepspeed. Setup involves conda environment creation, dependency installation (pip, mim), and repository cloning.
Data: Needs ScanNet dataset processing (via provided scripts) and downloading specific ReasonRecon datasets (segmentation and reconstruction).
Training/Inference: Scripts are available for training 2D segmentation (with LoRA) and the main reconstruction pipeline, as well as for inference. Pre-trained weights for 2D segmentation and TIFF are provided.

Highlighted Details:

Introduces the "reasoning reconstruction" task and the ReasonRecon benchmark, featuring extensive scene-instruction data for implicit reasoning.
Claims real-time performance and superior results over existing methods.
Features the TIFF module for enhanced instance fusion quality.

Maintenance & Community:

The project acknowledges several foundational libraries and datasets (LLaVA, ScanNet, etc.). No specific community channels or maintenance contacts are detailed in the provided text.

Licensing & Compatibility:

The README does not specify the software license or compatibility details for commercial use.

Limitations & Caveats:

Associated with an upcoming ICCV 2025 publication, indicating potentially ongoing development. The setup process, including dataset preparation and multi-stage training, appears complex and resource-intensive.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days