pytorch-a3c  by ikostrikov

PyTorch implementation of A3C reinforcement learning algorithm

created 8 years ago
1,283 stars

Top 31.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of the Asynchronous Advantage Actor-Critic (A3C) algorithm, as described in the paper "Asynchronous Methods for Deep Reinforcement Learning." It is designed for researchers and practitioners in reinforcement learning who want to experiment with or deploy A3C, offering a shared optimizer statistics approach for improved performance.

How It Works

The implementation leverages PyTorch for building the neural network models and managing computations. It utilizes an asynchronous approach where multiple worker processes interact with the environment concurrently, collecting experiences and updating a shared global network. This asynchronous nature allows for faster learning by reducing correlation between samples and enabling parallel exploration.

Quick Start & Requirements

  • Primary install / run command: python3 main.py --env-name "PongDeterministic-v4" --num-processes 16
  • Prerequisites: Python 3, PyTorch, OpenAI Gym environments (e.g., PongDeterministic-v4, BreakoutDeterministic-v4).
  • Setup time: Minimal, assuming Python 3 and necessary Gym environments are installed.

Highlighted Details

  • Implements A3C with shared optimizer statistics, as per the original paper.
  • Achieves convergence on PongDeterministic-v4 in approximately 15 minutes with 16 processes.
  • BreakoutDeterministic-v4 training requires several hours.
  • Author recommends A2C, PPO, and ACKTR for potentially better performance.

Maintenance & Community

  • Contributions are welcome via pull requests.
  • No specific community channels (Discord/Slack) or roadmap are mentioned.

Licensing & Compatibility

  • The repository is not explicitly licensed in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README suggests that A2C, PPO, and ACKTR may offer better performance than A3C, implying A3C might not be the state-of-the-art choice for all tasks. Training on more complex environments like BreakoutDeterministic-v4 can be time-consuming.

Health Check
Last commit

5 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.