GRANT  by H-EmbodVis

Embodied agents for parallel task execution

Created 1 month ago
355 stars

Top 78.8% on SourcePulse

GitHubView on GitHub
Project Summary

This project addresses the limitations of existing embodied AI task scheduling datasets by incorporating Operations Research (OR) principles and 3D spatial grounding. It introduces the ORS3D task and the GRANT model, enabling embodied agents to understand natural language instructions, ground actions in 3D environments, and optimize task execution by leveraging parallelizable subtasks to minimize completion time. The target audience includes researchers in embodied AI, robotics, and AI task planning.

How It Works

The project introduces the ORS3D task, which requires agents to perform complex, multi-step tasks in 3D environments while optimizing for efficiency. It leverages GRANT, an embodied multi-modal large language model, which incorporates a novel scheduling token mechanism. This mechanism allows the model to identify and exploit parallelizable subtasks, leading to more efficient overall task completion compared to sequential execution.

Quick Start & Requirements

  • Installation: Requires Python 3.10.16, PyTorch 1.12.1+cu116, and CUDA 11.6. Setup involves conda environment management, installing specific dependencies (openblas-devel, openjdk=11, torch-scatter, peft), and compiling C++ extensions (MinkowskiEngine, pointnet2).
  • Data & Weights: Download ORS3D-60K dataset from HuggingFace and 3D scenes from SceneVerse. Pretrained LLM (Tiny-Vicuna-1B) and model weights must be downloaded separately.
  • Execution: Training and evaluation are initiated via bash scripts/train.sh and bash scripts/eval.sh.
  • Resources: Links to Project Homepage, Dataset, and arXiv.

Highlighted Details

  • Accepted as an Oral presentation at AAAI 2026 with an approximate acceptance rate of 4.5%.
  • Features the ORS3D-60K dataset, comprising 60,000 composite tasks across 4,000 real-world scenes.
  • GRANT is an embodied multi-modal LLM specifically designed for efficient, parallelized task scheduling in 3D environments.

Maintenance & Community

The project is based on foundational works like Grounded 3D-LLM, SG3D, and LEO. Specific community channels or active maintenance team details are not provided in the README.

Licensing & Compatibility

The code is licensed under Apache 2.0. No specific compatibility notes for commercial use or closed-source linking are mentioned.

Limitations & Caveats

The setup process requires precise version management for Python, PyTorch, and CUDA, along with manual compilation of C++ extensions, indicating a potentially complex and fragile build environment. No explicit limitations regarding unsupported platforms, specific task complexities, or performance bounds are detailed.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Assaf Elovic Assaf Elovic(Cofounder of Tavily), and
2 more.

XAgent by OpenBMB

0.0%
8k
Autonomous LLM agent for complex task solving
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
15 more.

JARVIS by microsoft

0.0%
25k
System for LLM-orchestrated AI task automation
Created 2 years ago
Updated 5 months ago
Starred by Paul Stamatiou Paul Stamatiou(Cofounder of Limitless), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

claude-task-master by eyaltoledano

0.7%
25k
AI-powered task management system for code editors
Created 10 months ago
Updated 22 hours ago
Feedback? Help us improve.