JARVIS-1 by CraftJarvis

Minecraft agent for open-world multi-tasking with multimodal input

Created 2 years ago

390 stars

Top 73.6% on SourcePulse

Project Summary

JARVIS-1 is an open-world agent designed for complex, long-horizon tasks within the Minecraft environment. It targets researchers and developers in embodied AI and reinforcement learning, enabling agents to perceive multimodal inputs, generate sophisticated plans, and execute embodied control, mimicking human-like planning and interaction.

How It Works

JARVIS-1 leverages pre-trained multimodal language models to map visual observations and textual instructions into actionable plans. These plans are then executed by goal-conditioned controllers. A key innovation is its multimodal memory, which integrates both pre-trained knowledge and in-game experiences to enhance planning capabilities. This approach allows for sophisticated task decomposition and execution in a dynamic, open-world setting.

Quick Start & Requirements

Install: pip install -e . (after environment setup)
Prerequisites: Linux only, Python 3.10, JDK 8, Anaconda recommended. Requires downloading STEVE-I weights and setting TMPDIR=/tmp and OPENAI_API_KEY.
Setup: Requires running prepare_mcp.py to build MCP-Reborn.
Docs: Website, Paper

Highlighted Details

Capable of over 200 tasks in Minecraft, ranging from short-horizon (chopping trees) to long-horizon (obtaining a diamond pickaxe).
Outperforms state-of-the-art agents by 5x in the "ObtainDiamondPickaxe" task.
Utilizes a memory-augmented multimodal language model for planning.
Built upon STEVE-1 (VPT model) and integrates with Minedojo and MC-TextWorld.

Maintenance & Community

Project associated with Zihao Wang, Shaofei Cai, and others.
Twitter for updates.

Licensing & Compatibility

License details are not explicitly stated in the README.

Limitations & Caveats

The current release only includes offline evaluation code and a partial multimodal memory. Key components like the multimodal descriptor, multimodal retrieval, and online learning capabilities are marked as "Coming Soon" or not yet released, limiting the agent's full functionality.

JARVIS-1 by CraftJarvis

Explore Similar Projects

MC-Planner by CraftJarvis

GITM by OpenGVLab

SimWorld by SimWorld-AI

Compositional-Visual-Reasoning-Survey by pokerme7777

TuriX-CUA by TurixAI

agentchain by jina-ai

PandaGPT by yxuansu

Steve by YuvDwi

VIMA by vimalabs

RoboGen by Genesis-Embodied-AI

Magma by microsoft

MineDojo by MineDojo