Eureka  by eureka-research

LLM-based reward design for reinforcement learning

created 1 year ago
3,020 stars

Top 16.2% on sourcepulse

GitHubView on GitHub
Project Summary

Eureka addresses the challenge of designing effective reward functions for complex manipulation tasks, enabling reinforcement learning agents to achieve human-level performance. It targets researchers and engineers working with sequential decision-making and robotics, offering a method to automatically generate high-quality rewards using large language models (LLMs) without manual engineering.

How It Works

Eureka leverages LLMs like GPT-4 to iteratively generate and refine reward functions written in Python. It employs an evolutionary optimization approach, where the LLM proposes new reward code based on previous iterations' performance. This code is then integrated into an RL environment (specifically Isaac Gym) and evaluated. The LLM uses this feedback to improve subsequent reward function generations, effectively performing in-context learning to discover optimal reward structures. This method bypasses the need for task-specific prompting or pre-defined reward templates, leading to more generalizable and performant rewards.

Quick Start & Requirements

  • Installation: Requires Python 3.8+, Conda, Isaac Gym (Preview Release 4/4), and OpenAI API key.
  • Setup: Clone the repository, create a Conda environment, install Isaac Gym, then install Eureka and its dependencies (pip install -e .). Set OPENAI_API_KEY environment variable.
  • Running: Execute python eureka.py env={environment} iteration={num_iterations} sample={num_samples}.
  • Resources: Requires access to OpenAI API.
  • Links: Website, arXiv, Isaac Gym

Highlighted Details

  • Outperforms human expert rewards in 83% of 29 diverse RL environments, achieving a 52% average normalized improvement.
  • Enables gradient-free RLHF by incorporating human oversight into reward generation.
  • Successfully trained a simulated five-finger Shadow Hand for human-speed pen spinning.
  • Demonstrates generality by supporting new custom environments with minimal configuration.

Maintenance & Community

The project is associated with the ICLR 2024 paper "Eureka: Human-Level Reward Design via Coding Large Language Models." No specific community channels (Discord/Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

Eureka relies heavily on the OpenAI API, incurring costs and requiring an API key. The performance is dependent on the capabilities of the chosen LLM. The project is primarily for research purposes and is not an official NVIDIA product.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
68 stars in the last 90 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

LlamaGym by KhoomeiK

0.3%
1k
SDK for fine-tuning LLM agents with online reinforcement learning
created 1 year ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Shawn Wang Shawn Wang(Editor of Latent Space), and
2 more.

self-rewarding-lm-pytorch by lucidrains

0%
1k
Training framework for self-rewarding language models
created 1 year ago
updated 1 year ago
Feedback? Help us improve.