random-network-distillation by openai

RL research paper code

Created 7 years ago

928 stars

Top 39.4% on SourcePulse

View on GitHub

4 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Evan Hubinger

Head of Alignment Stress-Testing at Anthropic

Jerry Tworek

VP Research at OpenAI

Junxiao Song

Research Scientist at DeepSeek

Project Summary

This repository provides the code for the paper "Exploration by Random Network Distillation" (RND). It enables reinforcement learning agents to explore novel states in environments by rewarding them for encountering states that their internal "random network" predicts poorly. This is particularly beneficial for sparse-reward environments like Montezuma's Revenge.

How It Works

The core of RND involves two neural networks: a fixed, randomly initialized target network and a predictor network. The predictor network is trained to mimic the output of the target network for states encountered by the agent. The difference between the target and predictor network outputs serves as an intrinsic reward signal, encouraging the agent to visit states where the predictor network is less accurate, thus driving exploration.

Quick Start & Requirements

Primary install / run command: python run_atari.py --gamma_ext 0.999
Prerequisites: Python, MPI (for multi-GPU/multi-machine training), Atari environments.
To train on 1024 parallel environments using 8 GPUs: mpiexec -n 8 python run_atari.py --num_env 128 --gamma_ext 0.999
Blog post and videos are available.

Highlighted Details

Implements the Random Network Distillation (RND) algorithm for intrinsic motivation.
Designed for Atari environments, with a focus on sparse-reward tasks like Montezuma's Revenge.
Supports distributed training via MPI for scaling across multiple GPUs and machines.

Maintenance & Community

Status: Archive (code is provided as-is, no updates expected).
Notable contributors: Yuri Burda, Harri Edwards, Amos Storkey, Oleg Klimov.

Licensing & Compatibility

License: Not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The project is archived and will not receive further updates. The license is not specified, which may pose a barrier to commercial adoption or integration into closed-source projects.

Health Check

Last Commit

5 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days