a3c_continuous by dgriff777

RL training repo for continuous action A3C with LSTM in PyTorch

Created 8 years ago

260 stars

Top 97.7% on SourcePulse

View on GitHub

2 Experts Love This Project

Anton Osika

Cofounder of Lovable

David Ha

Cofounder of Sakana AI

Project Summary

This repository provides a PyTorch implementation of the Asynchronous Advantage Actor-Critic (A3C) algorithm, specifically tailored for continuous action spaces. It aims to solve challenging environments like BipedalWalker-v3 and BipedalWalkerHardcore-v3, enabling users to train effective models quickly, even on CPU.

How It Works

The project implements A3C with an LSTM component for handling sequential data. A novel A3G architecture is introduced, leveraging GPU for accelerated training. A3G maintains individual agent networks on the GPU while sharing a model on the CPU. Agent models are rapidly transferred to the CPU for asynchronous updates to the shared model, utilizing a Hogwild! training approach for frequent and lock-free updates, significantly boosting training speed.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python 3.7+, PyTorch, gym==0.26.2, spdlog, setproctitle.
Training BipedalWalker-v3: python main.py --env BipedalWalker-v3 --optimizer Adam --shared-optimizer --workers 8 --amsgrad --stop-when-solved --model-300-check --tensorboard-logger
Training BipedalWalkerHardcore-v3: python main.py --env BipedalWalkerHardcore-v3 --optimizer Adam --shared-optimizer --workers 18 --amsgrad --stop-when-solved --model-300-check --tensorboard-logger
Evaluation: python gym_eval.py --env BipedalWalkerHardcore-v3 --num-episodes 100
Official Docs: Not explicitly linked, but project structure implies standard RL setup.

Highlighted Details

Claims training BipedalWalkerHardcore-v3 to average 300+ reward in 20-40 minutes on CPU.
A3G architecture designed for substantially accelerated training, especially with larger models or pixel-based observations.
Solves BipedalWalker-v3 and the harder BipedalWalkerHardcore-v3, achieving average rewards over 300.
Utilizes spdlog for faster logging and setproctitle for process management.

Maintenance & Community

No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.
Project appears to be a personal implementation, referencing ikostrikov/pytorch-a3c and andrewliao11/pytorch-a3c-mujoco as references.

Licensing & Compatibility

The README does not explicitly state a license. The referenced repositories use MIT and Apache 2.0 licenses, respectively. Compatibility for commercial use is undetermined without a clear license.

Limitations & Caveats

The project is described as a continuous action space version of A3C LSTM and mentions training speed gains, but does not detail specific limitations or potential issues such as performance on other environments or the bus factor of the single author.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days