openrlbenchmark  by openrlbenchmark

RL experiment benchmarking and comparison

Created 3 years ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

Open RL Benchmark provides a unified platform for comparing reinforcement learning (RL) experiments across various popular libraries. It addresses the challenge of disparate tracking and reporting by offering a CLI tool to aggregate and visualize metrics from sources like Stable-baselines3, CleanRL, and Tianshou, enabling practitioners to easily benchmark and analyze RL agent performance.

How It Works

The project leverages a command-line interface (CLI) tool, openrlbenchmark.rlops, to query and consolidate experiment metrics from Weights and Biases (W&B). Users define filters specifying W&B entities, projects, environment identifiers, experiment names, and desired metrics. The tool then fetches this data, generates comparative plots (learning curves, performance profiles, aggregate scores), and tabular results, facilitating direct comparison of algorithms and implementations.

Quick Start & Requirements

Installation is straightforward via pip: pip install openrlbenchmark --upgrade. For development, clone the repository and use Poetry: git clone https://github.com/openrlbenchmark/openrlbenchmark.git && cd openrlbenchmark && poetry install. Prerequisites include Python version 3.7.1 to <3.10 and Poetry 1.2.1+ for development.

Highlighted Details

  • Supports a wide array of RL libraries including CleanRL, Stable-baselines3, Tianshou, Jaxrl, Baselines, and MORL-Baselines, with configurable metric extraction.
  • Generates comprehensive visualizations such as learning curves with standard deviation, performance profiles (Interquartile Mean - IQM), and aggregate human-normalized scores.
  • Features an experimental offline mode that builds a local SQLite database for faster plotting without continuous internet access.
  • Enables multi-metric comparison within a single analysis, offering deeper insights into agent behavior and training dynamics.

Maintenance & Community

The project is under active, albeit slow, development with no fixed roadmap. Contributions are welcomed, particularly in adding experiments from new libraries, expanding coverage for existing ones, and improving documentation. Interested parties can reach out to the maintainers or open GitHub issues. A citation is provided for academic use.

Licensing & Compatibility

The specific open-source license is not explicitly stated in the provided README content. Compatibility for commercial use or closed-source linking would require clarification of the license terms.

Limitations & Caveats

The project currently has a Python version constraint, requiring Python <3.10. The offline mode is experimental. Development is community-driven with no defined roadmap, relying on volunteer contributions for expansion and maintenance.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
12 more.

evaluate by huggingface

0%
2k
ML model evaluation library for standardized performance reporting
Created 3 years ago
Updated 1 month ago
Starred by Morgan Funtowicz Morgan Funtowicz(Head of ML Optimizations at Hugging Face), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
8 more.

lighteval by huggingface

0.2%
2k
LLM evaluation toolkit for multiple backends
Created 2 years ago
Updated 5 days ago
Feedback? Help us improve.