async-rl by coreylynch

TensorFlow/Keras implementation of async RL research paper

Created 9 years ago

1,008 stars

Top 37.0% on SourcePulse

View on GitHub

9 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

François Chollet

Author of Keras; Cofounder of Ndea, ARC Prize

Anastasis Germanidis

Cofounder of Runway

Junxiao Song

Research Scientist at DeepSeek

and 5 more!

Project Summary

This repository provides a TensorFlow and Keras implementation of the 1-step Q-learning algorithm described in the "Asynchronous Methods for Deep Reinforcement Learning" paper. It targets researchers and practitioners interested in efficient deep reinforcement learning, offering a memory-efficient approach that runs on standard hardware by using multiple actor-learner threads instead of experience replay.

How It Works

The implementation leverages multiple actor-learner threads to stabilize learning, avoiding the high memory requirements of experience replay. Each thread interacts with an OpenAI Gym environment (specifically Atari), collects experiences, and updates a shared global network. This asynchronous approach is designed to improve learning efficiency and stability.

Quick Start & Requirements

Install via pip: pip install tensorflow gym[atari] scikit-image
Requires Python 3.x.
Training command: python async_dqn.py --experiment breakout --game "Breakout-v0" --num_concurrent 8
TensorBoard visualization: tensorboard --logdir /tmp/summaries/breakout
Evaluation command: python async_dqn.py --experiment breakout --testing True --checkpoint_path /tmp/breakout.ckpt-2690000 --num_eval_episodes 100
Official Gym Atari setup: https://github.com/openai/gym#atari

Highlighted Details

Implements 1-step Q-learning from "Asynchronous Methods for Deep Reinforcement Learning".
Uses TensorFlow, Keras, and OpenAI Gym for Atari environments.
Designed to run on modest hardware (e.g., MacBook with 4GB RAM).
Includes functionality for training, TensorBoard visualization, and evaluation.

Maintenance & Community

This project appears to be a personal learning project with no explicit mention of ongoing maintenance, community channels, or notable contributors. The author welcomes feedback.

Licensing & Compatibility

The README does not specify a license. This may pose a restriction for commercial use or integration into closed-source projects.

Limitations & Caveats

The author notes that performance can vary significantly between runs, suggesting multiple experiments with different seeds are advisable for reliable evaluation. The implementation is presented as a learning project and may not be production-ready.

Health Check

Last Commit

7 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days