pytorch-a3c by ikostrikov

PyTorch implementation of A3C reinforcement learning algorithm

Created 9 years ago

1,310 stars

Top 30.3% on SourcePulse

View on GitHub

5 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Guillaume Lample

Cofounder of Mistral

James Bradbury

Head of Compute at Anthropic

Luca Antiga

CTO of Lightning AI

and 1 more!

Project Summary

This repository provides a PyTorch implementation of the Asynchronous Advantage Actor-Critic (A3C) algorithm, as described in the paper "Asynchronous Methods for Deep Reinforcement Learning." It is designed for researchers and practitioners in reinforcement learning who want to experiment with or deploy A3C, offering a shared optimizer statistics approach for improved performance.

How It Works

The implementation leverages PyTorch for building the neural network models and managing computations. It utilizes an asynchronous approach where multiple worker processes interact with the environment concurrently, collecting experiences and updating a shared global network. This asynchronous nature allows for faster learning by reducing correlation between samples and enabling parallel exploration.

Quick Start & Requirements

Primary install / run command: python3 main.py --env-name "PongDeterministic-v4" --num-processes 16
Prerequisites: Python 3, PyTorch, OpenAI Gym environments (e.g., PongDeterministic-v4, BreakoutDeterministic-v4).
Setup time: Minimal, assuming Python 3 and necessary Gym environments are installed.

Highlighted Details

Implements A3C with shared optimizer statistics, as per the original paper.
Achieves convergence on PongDeterministic-v4 in approximately 15 minutes with 16 processes.
BreakoutDeterministic-v4 training requires several hours.
Author recommends A2C, PPO, and ACKTR for potentially better performance.

Maintenance & Community

Contributions are welcome via pull requests.
No specific community channels (Discord/Slack) or roadmap are mentioned.

Licensing & Compatibility

The repository is not explicitly licensed in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README suggests that A2C, PPO, and ACKTR may offer better performance than A3C, implying A3C might not be the state-of-the-art choice for all tasks. Training on more complex environments like BreakoutDeterministic-v4 can be time-consuming.

Health Check

Last Commit

6 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days