RL training repo for continuous action A3C with LSTM in PyTorch
Top 98.6% on sourcepulse
This repository provides a PyTorch implementation of the Asynchronous Advantage Actor-Critic (A3C) algorithm, specifically tailored for continuous action spaces. It aims to solve challenging environments like BipedalWalker-v3 and BipedalWalkerHardcore-v3, enabling users to train effective models quickly, even on CPU.
How It Works
The project implements A3C with an LSTM component for handling sequential data. A novel A3G architecture is introduced, leveraging GPU for accelerated training. A3G maintains individual agent networks on the GPU while sharing a model on the CPU. Agent models are rapidly transferred to the CPU for asynchronous updates to the shared model, utilizing a Hogwild! training approach for frequent and lock-free updates, significantly boosting training speed.
Quick Start & Requirements
pip install -r requirements.txt
gym==0.26.2
, spdlog
, setproctitle
.python main.py --env BipedalWalker-v3 --optimizer Adam --shared-optimizer --workers 8 --amsgrad --stop-when-solved --model-300-check --tensorboard-logger
python main.py --env BipedalWalkerHardcore-v3 --optimizer Adam --shared-optimizer --workers 18 --amsgrad --stop-when-solved --model-300-check --tensorboard-logger
python gym_eval.py --env BipedalWalkerHardcore-v3 --num-episodes 100
Highlighted Details
spdlog
for faster logging and setproctitle
for process management.Maintenance & Community
ikostrikov/pytorch-a3c
and andrewliao11/pytorch-a3c-mujoco
as references.Licensing & Compatibility
Limitations & Caveats
The project is described as a continuous action space version of A3C LSTM and mentions training speed gains, but does not detail specific limitations or potential issues such as performance on other environments or the bus factor of the single author.
9 months ago
1 week