a3c_continuous  by dgriff777

RL training repo for continuous action A3C with LSTM in PyTorch

created 7 years ago
258 stars

Top 98.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a PyTorch implementation of the Asynchronous Advantage Actor-Critic (A3C) algorithm, specifically tailored for continuous action spaces. It aims to solve challenging environments like BipedalWalker-v3 and BipedalWalkerHardcore-v3, enabling users to train effective models quickly, even on CPU.

How It Works

The project implements A3C with an LSTM component for handling sequential data. A novel A3G architecture is introduced, leveraging GPU for accelerated training. A3G maintains individual agent networks on the GPU while sharing a model on the CPU. Agent models are rapidly transferred to the CPU for asynchronous updates to the shared model, utilizing a Hogwild! training approach for frequent and lock-free updates, significantly boosting training speed.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python 3.7+, PyTorch, gym==0.26.2, spdlog, setproctitle.
  • Training BipedalWalker-v3: python main.py --env BipedalWalker-v3 --optimizer Adam --shared-optimizer --workers 8 --amsgrad --stop-when-solved --model-300-check --tensorboard-logger
  • Training BipedalWalkerHardcore-v3: python main.py --env BipedalWalkerHardcore-v3 --optimizer Adam --shared-optimizer --workers 18 --amsgrad --stop-when-solved --model-300-check --tensorboard-logger
  • Evaluation: python gym_eval.py --env BipedalWalkerHardcore-v3 --num-episodes 100
  • Official Docs: Not explicitly linked, but project structure implies standard RL setup.

Highlighted Details

  • Claims training BipedalWalkerHardcore-v3 to average 300+ reward in 20-40 minutes on CPU.
  • A3G architecture designed for substantially accelerated training, especially with larger models or pixel-based observations.
  • Solves BipedalWalker-v3 and the harder BipedalWalkerHardcore-v3, achieving average rewards over 300.
  • Utilizes spdlog for faster logging and setproctitle for process management.

Maintenance & Community

  • No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.
  • Project appears to be a personal implementation, referencing ikostrikov/pytorch-a3c and andrewliao11/pytorch-a3c-mujoco as references.

Licensing & Compatibility

  • The README does not explicitly state a license. The referenced repositories use MIT and Apache 2.0 licenses, respectively. Compatibility for commercial use is undetermined without a clear license.

Limitations & Caveats

The project is described as a continuous action space version of A3C LSTM and mentions training speed gains, but does not detail specific limitations or potential issues such as performance on other environments or the bus factor of the single author.

Health Check
Last commit

9 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.