GRANT by H-EmbodVis

Embodied agents for parallel task execution

Created 3 months ago

357 stars

Top 78.7% on SourcePulse

Project Summary

This project addresses the limitations of existing embodied AI task scheduling datasets by incorporating Operations Research (OR) principles and 3D spatial grounding. It introduces the ORS3D task and the GRANT model, enabling embodied agents to understand natural language instructions, ground actions in 3D environments, and optimize task execution by leveraging parallelizable subtasks to minimize completion time. The target audience includes researchers in embodied AI, robotics, and AI task planning.

How It Works

The project introduces the ORS3D task, which requires agents to perform complex, multi-step tasks in 3D environments while optimizing for efficiency. It leverages GRANT, an embodied multi-modal large language model, which incorporates a novel scheduling token mechanism. This mechanism allows the model to identify and exploit parallelizable subtasks, leading to more efficient overall task completion compared to sequential execution.

Quick Start & Requirements

Installation: Requires Python 3.10.16, PyTorch 1.12.1+cu116, and CUDA 11.6. Setup involves conda environment management, installing specific dependencies (openblas-devel, openjdk=11, torch-scatter, peft), and compiling C++ extensions (MinkowskiEngine, pointnet2).
Data & Weights: Download ORS3D-60K dataset from HuggingFace and 3D scenes from SceneVerse. Pretrained LLM (Tiny-Vicuna-1B) and model weights must be downloaded separately.
Execution: Training and evaluation are initiated via bash scripts/train.sh and bash scripts/eval.sh.
Resources: Links to Project Homepage, Dataset, and arXiv.

Highlighted Details

Accepted as an Oral presentation at AAAI 2026 with an approximate acceptance rate of 4.5%.
Features the ORS3D-60K dataset, comprising 60,000 composite tasks across 4,000 real-world scenes.
GRANT is an embodied multi-modal LLM specifically designed for efficient, parallelized task scheduling in 3D environments.

Maintenance & Community

The project is based on foundational works like Grounded 3D-LLM, SG3D, and LEO. Specific community channels or active maintenance team details are not provided in the README.

Licensing & Compatibility

The code is licensed under Apache 2.0. No specific compatibility notes for commercial use or closed-source linking are mentioned.

Limitations & Caveats

The setup process requires precise version management for Python, PyTorch, and CUDA, along with manual compilation of C++ extensions, indicating a potentially complex and fragile build environment. No explicit limitations regarding unsupported platforms, specific task complexities, or performance bounds are detailed.

GRANT by H-EmbodVis

Explore Similar Projects

JARVIS-1 by CraftJarvis

agent-actors by shaman-ai

lyzr-automata by LyzrCore

swarm-tools by joelhooks

taskgen by simbianai

agentchain by jina-ai

Kimi-K2.5 by MoonshotAI

smartgpt by Cormanz

multi-agent-shogun by yohey-w

XAgent by OpenBMB

joyagent-jdgenie by jd-opensource

JARVIS by microsoft