rl-teacher  by nottombrown

RL from human preferences via a webapp for feedback collection

created 8 years ago
562 stars

Top 58.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an implementation of Deep Reinforcement Learning from Human Preferences, enabling users to train RL agents for tasks lacking explicit reward functions. It's suitable for researchers and practitioners interested in shaping agent behavior through human feedback, offering a novel approach to complex control problems.

How It Works

The system comprises a reward predictor trained on human preferences, which is then integrated into standard RL algorithms (TRPO, PPO). Human feedback is collected via a web application that presents pairs of trajectory segments for comparison. This approach allows agents to learn nuanced behaviors that are difficult to define with traditional reward functions.

Quick Start & Requirements

  • Installation: Requires MuJoCo, Python 3.5, and installation via pip install -e . and related sub-packages.
  • Prerequisites: MuJoCo license and binaries, TensorFlow backend.
  • Setup: Involves setting up a conda environment, cloning the repo, installing dependencies, and configuring Google Cloud Storage for media uploads.
  • Resources: Training requires significant computational resources, especially for rendering video segments.
  • Links: MuJoCo installation, human-feedback-api.

Highlighted Details

  • Implements Deep RL from Human Preferences (Christiano et al., 2017).
  • Includes a web UI for collecting human feedback on trajectory comparisons.
  • Supports multiple RL algorithms like TRPO and PPO.
  • Demonstrates training agents for non-standard behaviors (e.g., ballet for a walker robot).

Maintenance & Community

The project acknowledges contributions from Paul Christiano, Dario Amodei, Max Harms (@raelifin), Catherine Olsson (@catherio), and Kevin Frans (@kvfrans). An Atari extension is noted.

Licensing & Compatibility

The repository's licensing is not explicitly stated in the README, which may pose compatibility issues for commercial or closed-source use.

Limitations & Caveats

The README specifies Python 3.5, which is outdated. Setup for headless video rendering on Linux requires manual installation of XDummy and other dependencies. The project relies on Google Cloud Storage for media storage, necessitating cloud setup.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.