async-rl  by coreylynch

TensorFlow/Keras implementation of async RL research paper

created 9 years ago
1,011 stars

Top 37.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a TensorFlow and Keras implementation of the 1-step Q-learning algorithm described in the "Asynchronous Methods for Deep Reinforcement Learning" paper. It targets researchers and practitioners interested in efficient deep reinforcement learning, offering a memory-efficient approach that runs on standard hardware by using multiple actor-learner threads instead of experience replay.

How It Works

The implementation leverages multiple actor-learner threads to stabilize learning, avoiding the high memory requirements of experience replay. Each thread interacts with an OpenAI Gym environment (specifically Atari), collects experiences, and updates a shared global network. This asynchronous approach is designed to improve learning efficiency and stability.

Quick Start & Requirements

  • Install via pip: pip install tensorflow gym[atari] scikit-image
  • Requires Python 3.x.
  • Training command: python async_dqn.py --experiment breakout --game "Breakout-v0" --num_concurrent 8
  • TensorBoard visualization: tensorboard --logdir /tmp/summaries/breakout
  • Evaluation command: python async_dqn.py --experiment breakout --testing True --checkpoint_path /tmp/breakout.ckpt-2690000 --num_eval_episodes 100
  • Official Gym Atari setup: https://github.com/openai/gym#atari

Highlighted Details

  • Implements 1-step Q-learning from "Asynchronous Methods for Deep Reinforcement Learning".
  • Uses TensorFlow, Keras, and OpenAI Gym for Atari environments.
  • Designed to run on modest hardware (e.g., MacBook with 4GB RAM).
  • Includes functionality for training, TensorBoard visualization, and evaluation.

Maintenance & Community

This project appears to be a personal learning project with no explicit mention of ongoing maintenance, community channels, or notable contributors. The author welcomes feedback.

Licensing & Compatibility

The README does not specify a license. This may pose a restriction for commercial use or integration into closed-source projects.

Limitations & Caveats

The author notes that performance can vary significantly between runs, suggesting multiple experiments with different seeds are advisable for reliable evaluation. The implementation is presented as a learning project and may not be production-ready.

Health Check
Last commit

7 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.