batch-ppo by google-research

TensorFlow infrastructure for batched reinforcement learning

Created 8 years ago

974 stars

Top 37.9% on SourcePulse

View on GitHub

7 Experts Love This Project

Lilian Weng

Cofounder of Thinking Machines Lab

Aravind Srinivas

Cofounder of Perplexity

Luis Capelo

Cofounder of Lightning AI

Deepak Pathak

Cofounder of Skild AI; Professor at CMU

and 3 more!

Project Summary

This project provides an optimized infrastructure for reinforcement learning agents implemented in TensorFlow, specifically targeting efficient batched computation across multiple parallel environments. It offers an implementation of Proximal Policy Optimization (PPO) as a starting point for researchers and practitioners looking to build and experiment with RL algorithms.

How It Works

The core innovation lies in its batched environment interface, which integrates seamlessly with TensorFlow. It utilizes agents.tools.wrappers.ExternalProcess to run Gym environments in separate processes, bypassing Python's GIL for true parallelism. agents.tools.BatchEnv then aggregates these parallel environments, accepting batched actions and returning batched results. agents.tools.InGraphBatchEnv further integrates this into the TensorFlow graph, exposing environment steps as operations. Finally, agents.tools.simulate() fuses environment stepping and agent updates into a single TensorFlow operation for efficient training loops.

Quick Start & Requirements

Install: Clone the repository.
Run: python3 -m agents.scripts.train --logdir=/path/to/logdir --config=pendulum
Prerequisites: Python 2/3, TensorFlow 1.3+, Gym, ruamel.yaml.
Visualization: tensorboard --logdir=/path/to/logdir --port=2222
Rendering/Stats: python3 -m agents.scripts.visualize --logdir=/path/to/logdir/<time>-<config> --outdir=/path/to/outdir/
Docs: TensorFlow Agents paper (cited for code usage).

Highlighted Details

Optimized infrastructure for batched reinforcement learning in TensorFlow.
Efficient parallel environment execution using external processes.
Integrated TensorFlow graph operations for environment stepping.
Single-operation fusion of environment steps and agent updates.

Maintenance & Community

For questions, open an issue on GitHub.

Licensing & Compatibility

License: Not explicitly stated in the README, but the project is from Google Research. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires TensorFlow 1.3+, which is significantly outdated. The README mentions Python 2/3 compatibility, but modern usage would likely focus on Python 3. No explicit mention of GPU support or CUDA requirements is made, though TensorFlow typically benefits from them.

Health Check

Last Commit

7 years ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days