RL research paper reproducing ACER algorithm
Top 99.3% on sourcepulse
ACER is an actor-critic reinforcement learning algorithm designed for improved stability through batch off-policy updates and optional trust region optimization. It targets researchers and practitioners in deep reinforcement learning seeking more robust and sample-efficient training.
How It Works
ACER employs an actor-critic architecture with a key innovation: experience replay. This allows the agent to learn from past experiences in a batch, off-policy manner, decoupling data collection from policy updates. This approach enhances stability and sample efficiency compared to purely on-policy methods. Trust region updates can be optionally enabled to further constrain policy changes, preventing catastrophic forgetting and promoting more stable learning.
Quick Start & Requirements
conda env create -f environment.yml
and activate with source activate acer
.Highlighted Details
--on-policy
flag.Maintenance & Community
The project acknowledges contributions from @ikostrikov and @apaszke. No specific community channels or roadmap information are provided.
Licensing & Compatibility
The README does not specify a license. Compatibility for commercial use or closed-source linking is not mentioned.
Limitations & Caveats
The implementation currently uses a full trust region update instead of an "efficient" trust region, as noted in issue #1. The README does not provide links to official documentation, demos, or community resources, potentially hindering adoption.
2 years ago
1 day