Grounding_LLMs_with_online_RL by flowersteam

RL code for grounding LLMs in BabyAI-Text env

created 2 years ago

268 stars

Top 96.5% on sourcepulse

Project Summary

This repository provides code and a custom environment for grounding Large Language Models (LLMs) using online Reinforcement Learning (RL), specifically targeting the BabyAI-Text benchmark. It enables researchers and practitioners to fine-tune LLMs for instruction-following tasks within a simulated environment, offering a method to improve their functional understanding and task execution capabilities.

How It Works

The project implements the GLAM method, leveraging the Lamorel library to integrate LLMs into the RL training loop. LLMs are fine-tuned using Proximal Policy Optimization (PPO) with custom loss functions and additional heads. The approach allows for direct interaction between the LLM's output and the environment's state, enabling the model to learn grounded actions based on textual instructions and observations.

Quick Start & Requirements

Installation: Create a conda environment (conda create -n dlp python=3.10.8), activate it (conda activate dlp), install PyTorch (conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch), install project requirements (pip install -r requirements.txt), and install Lamorel (git clone https://github.com/flowersteam/lamorel.git; cd lamorel/lamorel; pip install -e .; cd ../..). BabyAI-Text installation details are in its package.
Prerequisites: Python 3.10.8, PyTorch 1.12.1, CUDA 11.3, and the Lamorel library.
Training: Use train_language_agent.py with Lamorel configurations.
Evaluation: Use post-training_tests.py.
Documentation: Project Website

Highlighted Details

Implements PPO loss for LLMs and additional heads for grounded action selection.
Supports training and evaluation of various agents, including custom LLM-based agents and BabyAI's built-in bot.
Includes scripts for launching experiments on SLURM clusters.
Offers behavioral cloning for LLMs as a pre-training step.

Maintenance & Community

The project is associated with the flowersteam organization. Further community interaction details are not explicitly provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The README does not specify the exact license, which may impact commercial use. Configuration paths for logs and models (saving_path_logs, saving_path_model) are marked as ???, requiring user definition. The project appears to be research-focused, and stability for production environments is not guaranteed.

Grounding_LLMs_with_online_RL by flowersteam

Explore Similar Projects

GenSim by liruiw

WebRL by THUDM

language-planner by huangwl18

generativeAgent_LLM by QuangBK

moatless-tools by aorwall

LlamaGym by KhoomeiK

ReCall by Agent-RL

GPTSwarm by metauto-ai

self-adaptive-llms by SakanaAI

RAGEN by RAGEN-AI

verifiers by willccbb

cgft-llm by echonoshy