Grounding_LLMs_with_online_RL  by flowersteam

RL code for grounding LLMs in BabyAI-Text env

created 2 years ago
268 stars

Top 96.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code and a custom environment for grounding Large Language Models (LLMs) using online Reinforcement Learning (RL), specifically targeting the BabyAI-Text benchmark. It enables researchers and practitioners to fine-tune LLMs for instruction-following tasks within a simulated environment, offering a method to improve their functional understanding and task execution capabilities.

How It Works

The project implements the GLAM method, leveraging the Lamorel library to integrate LLMs into the RL training loop. LLMs are fine-tuned using Proximal Policy Optimization (PPO) with custom loss functions and additional heads. The approach allows for direct interaction between the LLM's output and the environment's state, enabling the model to learn grounded actions based on textual instructions and observations.

Quick Start & Requirements

  • Installation: Create a conda environment (conda create -n dlp python=3.10.8), activate it (conda activate dlp), install PyTorch (conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch), install project requirements (pip install -r requirements.txt), and install Lamorel (git clone https://github.com/flowersteam/lamorel.git; cd lamorel/lamorel; pip install -e .; cd ../..). BabyAI-Text installation details are in its package.
  • Prerequisites: Python 3.10.8, PyTorch 1.12.1, CUDA 11.3, and the Lamorel library.
  • Training: Use train_language_agent.py with Lamorel configurations.
  • Evaluation: Use post-training_tests.py.
  • Documentation: Project Website

Highlighted Details

  • Implements PPO loss for LLMs and additional heads for grounded action selection.
  • Supports training and evaluation of various agents, including custom LLM-based agents and BabyAI's built-in bot.
  • Includes scripts for launching experiments on SLURM clusters.
  • Offers behavioral cloning for LLMs as a pre-training step.

Maintenance & Community

The project is associated with the flowersteam organization. Further community interaction details are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial use. Configuration paths for logs and models (saving_path_logs, saving_path_model) are marked as ???, requiring user definition. The project appears to be research-focused, and stability for production environments is not guaranteed.

Health Check
Last commit

11 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
13 stars in the last 90 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

LlamaGym by KhoomeiK

0.3%
1k
SDK for fine-tuning LLM agents with online reinforcement learning
created 1 year ago
updated 1 year ago
Feedback? Help us improve.