Video-Pre-Training by openai

Minecraft agent learning via video pretraining research paper

Created 3 years ago

1,616 stars

Top 25.8% on SourcePulse

View on GitHub

3 Experts Love This Project

Anastasis Germanidis

Cofounder of Runway

Johannes Hagemann

Cofounder of Prime Intellect

Jiayi Pan

Author of SWE-Gym; MTS at xAI

Project Summary

This repository provides code and pre-trained models for Video Pre-Training (VPT), a method for training agents to act in environments by watching unlabeled online videos. It's primarily aimed at researchers and developers interested in imitation learning and reinforcement learning in complex environments like Minecraft. The key benefit is enabling agents to learn sophisticated behaviors from diverse video data without explicit task supervision.

How It Works

VPT utilizes a transformer-based architecture to process video frames and predict actions. The core idea is to learn a general-purpose "foundational model" from a large corpus of unlabeled Minecraft gameplay videos using self-supervised objectives, such as predicting future actions or reconstructing masked video segments. This foundational model is then fine-tuned on smaller, task-specific datasets, potentially using reinforcement learning to optimize for specific rewards, allowing it to adapt to new tasks efficiently.

Quick Start & Requirements

Install MineRL environment: pip install git+https://github.com/minerllabs/minerl
Install project requirements: pip install -r requirements.txt
Note: PyTorch version is pinned to torch==1.9.0, incompatible with Python 3.10+. For Python 3.10+, install a newer PyTorch version (pip install torch), but expect potential performance changes.
Run agent: python run_agent.py --model [path to .model file] --weights [path to .weight file]
Official docs: https://minerl.io/docs/

Highlighted Details

Offers pre-trained models for various tasks including behavioral cloning (foundational, house, early game) and RL-tuned models (diamond pickaxe acquisition).
Includes code for running an Inverse Dynamics Model (IDM) to predict actions from video.
Provides scripts for fine-tuning models using behavioral cloning on custom datasets.
Details the data collection process and prompts used for the MineRL BASALT competition datasets.

Maintenance & Community

Developed by OpenAI with contributions from various researchers.
Code prepared by Anssi Kanervisto for the MineRL BASALT competition.
No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

The repository itself does not explicitly state a license. The underlying MineRL environment may have its own licensing.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The behavioral cloning fine-tuning script is a "rough demonstration" and not an exact recreation of the original paper's experiments, with limitations in single-step gradient computation and slow processing.
The pinned PyTorch version (1.9.0) restricts compatibility with newer Python versions (3.10+).

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

33 stars in the last 30 days