Eureka by eureka-research

LLM-based reward design for reinforcement learning

Created 2 years ago

3,102 stars

Top 15.3% on SourcePulse

View on GitHub

2 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Edward Sun

Research Scientist at Meta Superintelligence Lab

Project Summary

Eureka addresses the challenge of designing effective reward functions for complex manipulation tasks, enabling reinforcement learning agents to achieve human-level performance. It targets researchers and engineers working with sequential decision-making and robotics, offering a method to automatically generate high-quality rewards using large language models (LLMs) without manual engineering.

How It Works

Eureka leverages LLMs like GPT-4 to iteratively generate and refine reward functions written in Python. It employs an evolutionary optimization approach, where the LLM proposes new reward code based on previous iterations' performance. This code is then integrated into an RL environment (specifically Isaac Gym) and evaluated. The LLM uses this feedback to improve subsequent reward function generations, effectively performing in-context learning to discover optimal reward structures. This method bypasses the need for task-specific prompting or pre-defined reward templates, leading to more generalizable and performant rewards.

Quick Start & Requirements

Installation: Requires Python 3.8+, Conda, Isaac Gym (Preview Release 4/4), and OpenAI API key.
Setup: Clone the repository, create a Conda environment, install Isaac Gym, then install Eureka and its dependencies (pip install -e .). Set OPENAI_API_KEY environment variable.
Running: Execute python eureka.py env={environment} iteration={num_iterations} sample={num_samples}.
Resources: Requires access to OpenAI API.
Links: Website, arXiv, Isaac Gym

Highlighted Details

Outperforms human expert rewards in 83% of 29 diverse RL environments, achieving a 52% average normalized improvement.
Enables gradient-free RLHF by incorporating human oversight into reward generation.
Successfully trained a simulated five-finger Shadow Hand for human-speed pen spinning.
Demonstrates generality by supporting new custom environments with minimal configuration.

Maintenance & Community

The project is associated with the ICLR 2024 paper "Eureka: Human-Level Reward Design via Coding Large Language Models." No specific community channels (Discord/Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

Eureka relies heavily on the OpenAI API, incurring costs and requiring an API key. The performance is dependent on the capabilities of the chosen LLM. The project is primarily for research purposes and is not an official NVIDIA product.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

19 stars in the last 30 days