LLM-based reward design for reinforcement learning
Top 16.2% on sourcepulse
Eureka addresses the challenge of designing effective reward functions for complex manipulation tasks, enabling reinforcement learning agents to achieve human-level performance. It targets researchers and engineers working with sequential decision-making and robotics, offering a method to automatically generate high-quality rewards using large language models (LLMs) without manual engineering.
How It Works
Eureka leverages LLMs like GPT-4 to iteratively generate and refine reward functions written in Python. It employs an evolutionary optimization approach, where the LLM proposes new reward code based on previous iterations' performance. This code is then integrated into an RL environment (specifically Isaac Gym) and evaluated. The LLM uses this feedback to improve subsequent reward function generations, effectively performing in-context learning to discover optimal reward structures. This method bypasses the need for task-specific prompting or pre-defined reward templates, leading to more generalizable and performant rewards.
Quick Start & Requirements
pip install -e .
). Set OPENAI_API_KEY
environment variable.python eureka.py env={environment} iteration={num_iterations} sample={num_samples}
.Highlighted Details
Maintenance & Community
The project is associated with the ICLR 2024 paper "Eureka: Human-Level Reward Design via Coding Large Language Models." No specific community channels (Discord/Slack) or active maintenance signals are provided in the README.
Licensing & Compatibility
Limitations & Caveats
Eureka relies heavily on the OpenAI API, incurring costs and requiring an API key. The performance is dependent on the capabilities of the chosen LLM. The project is primarily for research purposes and is not an official NVIDIA product.
1 year ago
1 day