LlamaGym  by KhoomeiK

SDK for fine-tuning LLM agents with online reinforcement learning

Created 1 year ago
1,230 stars

Top 32.0% on SourcePulse

GitHubView on GitHub
Project Summary

LlamaGym simplifies the process of fine-tuning Large Language Model (LLM) agents using online reinforcement learning (RL) within Gymnasium-compatible environments. It targets researchers and developers looking to enable LLMs to learn and adapt in real-time through interaction, abstracting away the complexities of managing conversational context, reward signals, and RL algorithms like PPO.

How It Works

LlamaGym provides an Agent abstract class that encapsulates the core logic for integrating LLMs with RL. It handles the boilerplate code for managing LLM conversation history, batching episodes, assigning rewards, and setting up RL training loops. This approach allows users to focus on defining agent behavior through methods like get_system_prompt, format_observation, and extract_action, making it easier to experiment with prompting and hyperparameters across various RL environments.

Quick Start & Requirements

  • Install via pip: pip install llamagym
  • Requires a pre-trained LLM (e.g., Llama-2-7b) and its corresponding tokenizer.
  • Needs a Gymnasium environment (e.g., gym.make("Blackjack-v1")).
  • GPU and CUDA are recommended for efficient LLM fine-tuning.
  • Full example: examples/blackjack.py

Highlighted Details

  • Simplifies online RL fine-tuning for LLM agents.
  • Abstracts LLM context management, reward assignment, and PPO setup.
  • Enables rapid iteration on agent prompting and hyperparameters.
  • Integrates seamlessly with Gymnasium environments.

Maintenance & Community

  • Described as a "weekend project" and "WIP" (Work In Progress).
  • Welcomes contributions.
  • No specific community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. The repository's license file should be consulted for details.

Limitations & Caveats

Online RL convergence is notoriously difficult and requires hyperparameter tuning. The current implementation prioritizes simplicity over compute efficiency compared to frameworks like Lamorel. Supervised fine-tuning before RL is suggested but not yet implemented.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
2
Star History
16 stars in the last 30 days

Explore Similar Projects

Starred by Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research) and Will Brown Will Brown(Research Lead at Prime Intellect).

agent-lightning by microsoft

6.0%
2k
Train any AI agent with rollouts and feedback
Created 3 months ago
Updated 2 days ago
Feedback? Help us improve.