ACER  by Kaixhin

RL research paper reproducing ACER algorithm

Created 8 years ago
254 stars

Top 99.1% on SourcePulse

GitHubView on GitHub
Project Summary

ACER is an actor-critic reinforcement learning algorithm designed for improved stability through batch off-policy updates and optional trust region optimization. It targets researchers and practitioners in deep reinforcement learning seeking more robust and sample-efficient training.

How It Works

ACER employs an actor-critic architecture with a key innovation: experience replay. This allows the agent to learn from past experiences in a batch, off-policy manner, decoupling data collection from policy updates. This approach enhances stability and sample efficiency compared to purely on-policy methods. Trust region updates can be optionally enabled to further constrain policy changes, preventing catastrophic forgetting and promoting more stable learning.

Quick Start & Requirements

  • Install dependencies via conda env create -f environment.yml and activate with source activate acer.
  • Requires OpenAI Gym and PyTorch.
  • Official documentation and demo links are not provided in the README.

Highlighted Details

  • Implements actor-critic with experience replay for stability.
  • Supports batch off-policy updates.
  • Optional trust region updates are available.
  • Can run asynchronous advantage actor-critic (A3C) with a Q-value head via the --on-policy flag.

Maintenance & Community

The project acknowledges contributions from @ikostrikov and @apaszke. No specific community channels or roadmap information are provided.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not mentioned.

Limitations & Caveats

The implementation currently uses a full trust region update instead of an "efficient" trust region, as noted in issue #1. The README does not provide links to official documentation, demos, or community resources, potentially hindering adoption.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Philipp Moritz Philipp Moritz(Cofounder of Anyscale), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
1 more.

ARS by modestyachts

0.2%
425
Reinforcement learning via augmented random search
Created 7 years ago
Updated 4 years ago
Feedback? Help us improve.