SDK for fine-tuning LLM agents with online reinforcement learning
Top 33.1% on sourcepulse
LlamaGym simplifies the process of fine-tuning Large Language Model (LLM) agents using online reinforcement learning (RL) within Gymnasium-compatible environments. It targets researchers and developers looking to enable LLMs to learn and adapt in real-time through interaction, abstracting away the complexities of managing conversational context, reward signals, and RL algorithms like PPO.
How It Works
LlamaGym provides an Agent
abstract class that encapsulates the core logic for integrating LLMs with RL. It handles the boilerplate code for managing LLM conversation history, batching episodes, assigning rewards, and setting up RL training loops. This approach allows users to focus on defining agent behavior through methods like get_system_prompt
, format_observation
, and extract_action
, making it easier to experiment with prompting and hyperparameters across various RL environments.
Quick Start & Requirements
pip install llamagym
gym.make("Blackjack-v1")
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Online RL convergence is notoriously difficult and requires hyperparameter tuning. The current implementation prioritizes simplicity over compute efficiency compared to frameworks like Lamorel. Supervised fine-tuning before RL is suggested but not yet implemented.
1 year ago
1 week