JARVIS-1  by CraftJarvis

Minecraft agent for open-world multi-tasking with multimodal input

Created 1 year ago
378 stars

Top 75.2% on SourcePulse

GitHubView on GitHub
Project Summary

JARVIS-1 is an open-world agent designed for complex, long-horizon tasks within the Minecraft environment. It targets researchers and developers in embodied AI and reinforcement learning, enabling agents to perceive multimodal inputs, generate sophisticated plans, and execute embodied control, mimicking human-like planning and interaction.

How It Works

JARVIS-1 leverages pre-trained multimodal language models to map visual observations and textual instructions into actionable plans. These plans are then executed by goal-conditioned controllers. A key innovation is its multimodal memory, which integrates both pre-trained knowledge and in-game experiences to enhance planning capabilities. This approach allows for sophisticated task decomposition and execution in a dynamic, open-world setting.

Quick Start & Requirements

  • Install: pip install -e . (after environment setup)
  • Prerequisites: Linux only, Python 3.10, JDK 8, Anaconda recommended. Requires downloading STEVE-I weights and setting TMPDIR=/tmp and OPENAI_API_KEY.
  • Setup: Requires running prepare_mcp.py to build MCP-Reborn.
  • Docs: Website, Paper

Highlighted Details

  • Capable of over 200 tasks in Minecraft, ranging from short-horizon (chopping trees) to long-horizon (obtaining a diamond pickaxe).
  • Outperforms state-of-the-art agents by 5x in the "ObtainDiamondPickaxe" task.
  • Utilizes a memory-augmented multimodal language model for planning.
  • Built upon STEVE-1 (VPT model) and integrates with Minedojo and MC-TextWorld.

Maintenance & Community

  • Project associated with Zihao Wang, Shaofei Cai, and others.
  • Twitter for updates.

Licensing & Compatibility

  • License details are not explicitly stated in the README.

Limitations & Caveats

The current release only includes offline evaluation code and a partial multimodal memory. Key components like the multimodal descriptor, multimodal retrieval, and online learning capabilities are marked as "Coming Soon" or not yet released, limiting the agent's full functionality.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect).

GITM by OpenGVLab

0%
633
LLM agent for Minecraft open-world environments
Created 2 years ago
Updated 2 years ago
Feedback? Help us improve.