RL code for grounding LLMs in BabyAI-Text env
Top 96.5% on sourcepulse
This repository provides code and a custom environment for grounding Large Language Models (LLMs) using online Reinforcement Learning (RL), specifically targeting the BabyAI-Text benchmark. It enables researchers and practitioners to fine-tune LLMs for instruction-following tasks within a simulated environment, offering a method to improve their functional understanding and task execution capabilities.
How It Works
The project implements the GLAM method, leveraging the Lamorel library to integrate LLMs into the RL training loop. LLMs are fine-tuned using Proximal Policy Optimization (PPO) with custom loss functions and additional heads. The approach allows for direct interaction between the LLM's output and the environment's state, enabling the model to learn grounded actions based on textual instructions and observations.
Quick Start & Requirements
conda create -n dlp python=3.10.8
), activate it (conda activate dlp
), install PyTorch (conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
), install project requirements (pip install -r requirements.txt
), and install Lamorel (git clone https://github.com/flowersteam/lamorel.git; cd lamorel/lamorel; pip install -e .; cd ../..
). BabyAI-Text installation details are in its package.train_language_agent.py
with Lamorel configurations.post-training_tests.py
.Highlighted Details
Maintenance & Community
The project is associated with the flowersteam organization. Further community interaction details are not explicitly provided in the README.
Licensing & Compatibility
The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.
Limitations & Caveats
The README does not specify the exact license, which may impact commercial use. Configuration paths for logs and models (saving_path_logs
, saving_path_model
) are marked as ???
, requiring user definition. The project appears to be research-focused, and stability for production environments is not guaranteed.
11 months ago
1 week