LIRA  by zhen6618

Multimodal LLM for implicit instruction-guided 3D reconstruction

Created 7 months ago
318 stars

Top 84.9% on SourcePulse

GitHubView on GitHub
Project Summary

Summary:

LIRA tackles 3D reconstruction from complex, implicit language instructions, overcoming limitations of systems relying on explicit commands. It targets computer vision and multimodal AI researchers, offering an LLM-driven system for real-time scene reconstruction with improved accuracy.

How It Works:

LIRA enables incremental 3D reconstruction from RGB-D sequences and complex instructions. It utilizes a multimodal LLM to interpret implicit commands, identify 2D instances, and project them into a 3D map. A novel Text-enhanced Instance Fusion (TIFF) module enhances instance fusion quality by processing multiple keyframes simultaneously within Fragment bounding volumes. The system introduces the ReasonRecon benchmark for evaluating this task.

Quick Start & Requirements:

  • Installation: Requires Python 3.9, PyTorch 2.0.0 (CUDA 11.7), mmcv-full, sparsehash, and deepspeed. Setup involves conda environment creation, dependency installation (pip, mim), and repository cloning.
  • Data: Needs ScanNet dataset processing (via provided scripts) and downloading specific ReasonRecon datasets (segmentation and reconstruction).
  • Training/Inference: Scripts are available for training 2D segmentation (with LoRA) and the main reconstruction pipeline, as well as for inference. Pre-trained weights for 2D segmentation and TIFF are provided.

Highlighted Details:

  • Introduces the "reasoning reconstruction" task and the ReasonRecon benchmark, featuring extensive scene-instruction data for implicit reasoning.
  • Claims real-time performance and superior results over existing methods.
  • Features the TIFF module for enhanced instance fusion quality.

Maintenance & Community:

The project acknowledges several foundational libraries and datasets (LLaVA, ScanNet, etc.). No specific community channels or maintenance contacts are detailed in the provided text.

Licensing & Compatibility:

The README does not specify the software license or compatibility details for commercial use.

Limitations & Caveats:

Associated with an upcoming ICCV 2025 publication, indicating potentially ongoing development. The setup process, including dataset preparation and multi-stage training, appears complex and resource-intensive.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.