LlamaGym by KhoomeiK

SDK for fine-tuning LLM agents with online reinforcement learning

Created 1 year ago

1,247 stars

Top 31.5% on SourcePulse

View on GitHub

5 Experts Love This Project

Wing Lian

Founder of Axolotl AI

Vincent Weisser

Cofounder of Prime Intellect

Will Brown

Research Lead at Prime Intellect

Omar Sanseviero

DevRel at Google DeepMind

and 1 more!

Project Summary

LlamaGym simplifies the process of fine-tuning Large Language Model (LLM) agents using online reinforcement learning (RL) within Gymnasium-compatible environments. It targets researchers and developers looking to enable LLMs to learn and adapt in real-time through interaction, abstracting away the complexities of managing conversational context, reward signals, and RL algorithms like PPO.

How It Works

LlamaGym provides an Agent abstract class that encapsulates the core logic for integrating LLMs with RL. It handles the boilerplate code for managing LLM conversation history, batching episodes, assigning rewards, and setting up RL training loops. This approach allows users to focus on defining agent behavior through methods like get_system_prompt, format_observation, and extract_action, making it easier to experiment with prompting and hyperparameters across various RL environments.

Quick Start & Requirements

Install via pip: pip install llamagym
Requires a pre-trained LLM (e.g., Llama-2-7b) and its corresponding tokenizer.
Needs a Gymnasium environment (e.g., gym.make("Blackjack-v1")).
GPU and CUDA are recommended for efficient LLM fine-tuning.
Full example: examples/blackjack.py

Highlighted Details

Simplifies online RL fine-tuning for LLM agents.
Abstracts LLM context management, reward assignment, and PPO setup.
Enables rapid iteration on agent prompting and hyperparameters.
Integrates seamlessly with Gymnasium environments.

Maintenance & Community

Described as a "weekend project" and "WIP" (Work In Progress).
Welcomes contributions.
No specific community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The repository's license file should be consulted for details.

Limitations & Caveats

Online RL convergence is notoriously difficult and requires hyperparameter tuning. The current implementation prioritizes simplicity over compute efficiency compared to frameworks like Lamorel. Supervised fine-tuning before RL is suggested but not yet implemented.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days