3D-Mem by UMass-Embodied-AGI

3D Scene Memory for Embodied Exploration and Reasoning

Created 1 year ago

264 stars

Top 96.5% on SourcePulse

Project Summary

Summary

UMass-Embodied-AGI/3D-Mem provides the official source code for the CVPR 2025 paper "3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning." It addresses the challenge of enabling embodied agents to explore and reason about complex 3D environments by introducing a novel 3D scene memory system. This repository is targeted at researchers and engineers in embodied AI, offering a framework to enhance agent capabilities in tasks like embodied question answering and scene understanding.

How It Works

The project's core innovation is the "3D Scene Memory," a system designed to store and retrieve spatio-temporal information about 3D environments for embodied agents. This memory mechanism facilitates more intelligent exploration and reasoning by allowing agents to build and utilize a rich representation of their surroundings. The specific algorithms and architectural choices underpinning this memory are detailed in the associated CVPR 2025 publication.

Quick Start & Requirements

Installation: Requires setting up a conda environment with Python 3.9, PyTorch 2.0.1 (with CUDA 11.8 support), Habitat-Sim 0.2.5, PyTorch3D 0.7.4, and various other libraries including omegaconf, open-clip-torch, ultralytics, supervision, opencv-python-headless, scikit-learn, scikit-image, open3d, hipart, openai, and httpx.
Prerequisites: Download and specify the path to the HM3D dataset. OpenAI API endpoint and key must be configured.
Evaluation: Scripts are provided for running evaluations on A-EQA and GOAT-Bench benchmarks.
Links: Paper on arXiv, CVPR 2025 publication details.

Highlighted Details

Official source code for the CVPR 2025 paper "3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning."
Includes inference code and evaluation scripts for A-EQA and GOAT-Bench benchmarks.
Supports evaluation on the HM3D dataset.

Maintenance & Community

The provided README does not contain specific details regarding maintainers, community channels (e.g., Discord, Slack), roadmaps, or notable sponsorships.

Licensing & Compatibility

The README does not explicitly state the project's license. This omission requires further investigation for compatibility, especially for commercial use or integration into closed-source projects.

Limitations & Caveats

Saving visualizations during evaluation can significantly slow down the process. For the GOAT-Bench evaluation, only the first of 10 explore episodes per scene is tested by default due to time and resource constraints. The setup requires downloading a substantial dataset (HM3D) and configuring external API access (OpenAI).

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days