ACER by Kaixhin

RL research paper reproducing ACER algorithm

Created 8 years ago

256 stars

Top 98.5% on SourcePulse

View on GitHub

2 Experts Love This Project

Jerry Tworek

VP Research at OpenAI

James Bradbury

Head of Compute at Anthropic

Project Summary

ACER is an actor-critic reinforcement learning algorithm designed for improved stability through batch off-policy updates and optional trust region optimization. It targets researchers and practitioners in deep reinforcement learning seeking more robust and sample-efficient training.

How It Works

ACER employs an actor-critic architecture with a key innovation: experience replay. This allows the agent to learn from past experiences in a batch, off-policy manner, decoupling data collection from policy updates. This approach enhances stability and sample efficiency compared to purely on-policy methods. Trust region updates can be optionally enabled to further constrain policy changes, preventing catastrophic forgetting and promoting more stable learning.

Quick Start & Requirements

Install dependencies via conda env create -f environment.yml and activate with source activate acer.
Requires OpenAI Gym and PyTorch.
Official documentation and demo links are not provided in the README.

Highlighted Details

Implements actor-critic with experience replay for stability.
Supports batch off-policy updates.
Optional trust region updates are available.
Can run asynchronous advantage actor-critic (A3C) with a Q-value head via the --on-policy flag.

Maintenance & Community

The project acknowledges contributions from @ikostrikov and @apaszke. No specific community channels or roadmap information are provided.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not mentioned.

Limitations & Caveats

The implementation currently uses a full trust region update instead of an "efficient" trust region, as noted in issue #1. The README does not provide links to official documentation, demos, or community resources, potentially hindering adoption.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days