Minecraft agent for open-world multi-tasking with multimodal input
Top 77.1% on sourcepulse
JARVIS-1 is an open-world agent designed for complex, long-horizon tasks within the Minecraft environment. It targets researchers and developers in embodied AI and reinforcement learning, enabling agents to perceive multimodal inputs, generate sophisticated plans, and execute embodied control, mimicking human-like planning and interaction.
How It Works
JARVIS-1 leverages pre-trained multimodal language models to map visual observations and textual instructions into actionable plans. These plans are then executed by goal-conditioned controllers. A key innovation is its multimodal memory, which integrates both pre-trained knowledge and in-game experiences to enhance planning capabilities. This approach allows for sophisticated task decomposition and execution in a dynamic, open-world setting.
Quick Start & Requirements
pip install -e .
(after environment setup)TMPDIR=/tmp
and OPENAI_API_KEY
.prepare_mcp.py
to build MCP-Reborn.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The current release only includes offline evaluation code and a partial multimodal memory. Key components like the multimodal descriptor, multimodal retrieval, and online learning capabilities are marked as "Coming Soon" or not yet released, limiting the agent's full functionality.
1 year ago
1 day