Discover and explore top open-source AI tools and projects—updated daily.
mrahtzRL from human preferences reproduction
Top 82.9% on SourcePulse
This repository provides a reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences" paper, enabling users to train agents using human feedback. It targets researchers and practitioners interested in preference-based RL, offering a practical implementation for environments like Pong and Enduro.
How It Works
The project employs an asynchronous architecture with three main components: A2C workers for environment interaction and policy training, a preference interface for collecting human feedback on agent behavior clips, and a reward predictor network. Video clips generated by A2C workers are queued and presented in pairs by the preference interface. Human preferences are then fed to the reward predictor, which trains a neural network to estimate reward signals from agent behavior. These predicted rewards are used to train the A2C workers, creating a closed loop for preference-based learning.
Quick Start & Requirements
pipenv install.pipenv run pip install tensorflow==1.15 or tensorflow-gpu==1.15.pipenv install --dev.pipenv shell.python3 run.py <mode> <environment>.MovingDotNoFrameskip-v0, PongNoFrameskip-v4, EnduroNoFrameskip-v4.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
3 years ago
Inactive
allenai
KhoomeiK
nottombrown
allenai
eureka-research