openrlbenchmark by openrlbenchmark

RL experiment benchmarking and comparison

Created 4 years ago

266 stars

Top 96.0% on SourcePulse

View on GitHub

2 Experts Love This Project

Omar Sanseviero

DevRel at Google DeepMind

Nathan Lambert

Research Scientist at AI2

Project Summary

Open RL Benchmark provides a unified platform for comparing reinforcement learning (RL) experiments across various popular libraries. It addresses the challenge of disparate tracking and reporting by offering a CLI tool to aggregate and visualize metrics from sources like Stable-baselines3, CleanRL, and Tianshou, enabling practitioners to easily benchmark and analyze RL agent performance.

How It Works

The project leverages a command-line interface (CLI) tool, openrlbenchmark.rlops, to query and consolidate experiment metrics from Weights and Biases (W&B). Users define filters specifying W&B entities, projects, environment identifiers, experiment names, and desired metrics. The tool then fetches this data, generates comparative plots (learning curves, performance profiles, aggregate scores), and tabular results, facilitating direct comparison of algorithms and implementations.

Quick Start & Requirements

Installation is straightforward via pip: pip install openrlbenchmark --upgrade. For development, clone the repository and use Poetry: git clone https://github.com/openrlbenchmark/openrlbenchmark.git && cd openrlbenchmark && poetry install. Prerequisites include Python version 3.7.1 to <3.10 and Poetry 1.2.1+ for development.

Highlighted Details

Supports a wide array of RL libraries including CleanRL, Stable-baselines3, Tianshou, Jaxrl, Baselines, and MORL-Baselines, with configurable metric extraction.
Generates comprehensive visualizations such as learning curves with standard deviation, performance profiles (Interquartile Mean - IQM), and aggregate human-normalized scores.
Features an experimental offline mode that builds a local SQLite database for faster plotting without continuous internet access.
Enables multi-metric comparison within a single analysis, offering deeper insights into agent behavior and training dynamics.

Maintenance & Community

The project is under active, albeit slow, development with no fixed roadmap. Contributions are welcomed, particularly in adding experiments from new libraries, expanding coverage for existing ones, and improving documentation. Interested parties can reach out to the maintainers or open GitHub issues. A citation is provided for academic use.

Licensing & Compatibility

The specific open-source license is not explicitly stated in the provided README content. Compatibility for commercial use or closed-source linking would require clarification of the license terms.

Limitations & Caveats

The project currently has a Python version constraint, requiring Python <3.10. The offline mode is experimental. Development is community-driven with no defined roadmap, relying on volunteer contributions for expansion and maintenance.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days