sweet_rl by facebookresearch

LLM agents trained for collaborative reasoning

Created 1 year ago

264 stars

Top 96.7% on SourcePulse

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This repository provides the official implementation for SWEET-RL and the ColBench benchmark, addressing the challenge of effective credit assignment in multi-turn LLM agent interactions. It targets researchers and developers building collaborative LLM agents, offering a novel RL algorithm that significantly enhances performance on complex reasoning tasks, enabling smaller models to rival state-of-the-art systems.

How It Works

SWEET-RL employs a novel critic model trained with additional training-time information to provide step-level rewards. This approach improves credit assignment over multiple turns, a key limitation in prior multi-turn RL methods. The framework builds on a custom fork of openrlhf, incorporating multi-turn Direct Preference Optimization (DPO) and length normalization to optimize LLM agents for collaborative reasoning.

Quick Start & Requirements

Installation involves cloning the repository and installing dependencies via pip install -e .. Setting up the environment requires specific prerequisites depending on the task:

Backend Programming: A VLLM server simulating a human collaborator (e.g., using Llama-3.1-70B-Instruct) and the facebook/collaborative_agent_bench dataset.
Frontend Design: GeckoDriver and Firefox, a VLLM server with a vision model (e.g., Qwen2-VL-72B-Instruct), and the HuggingFaceM4/WebSight dataset. Both tasks require significant GPU resources for the VLLM servers. The associated paper can be found at arXiv:2503.15478.

Highlighted Details

SWEET-RL achieves a 6% absolute improvement in success and win rates on the ColBench benchmark.
It enables Llama-3.1-8B to match or exceed GPT4-o performance in realistic collaborative content creation tasks.
Introduces the ColBench benchmark, featuring backend programming and frontend design scenarios.
Leverages a custom openrlhf fork supporting multi-turn DPO and length normalization.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmap are provided in the README excerpt.

Licensing & Compatibility

The project is released under the CC-By-NC (Creative Commons Attribution-NonCommercial) license. This license restricts usage to non-commercial purposes, impacting its compatibility with commercial applications or closed-source projects requiring commercial use.

Limitations & Caveats

The setup process is complex, requiring the deployment of VLLM servers and specific model configurations. The CC-By-NC license strictly prohibits commercial use. Further details on known bugs, alpha status, or deprecations are not specified in the provided documentation.

sweet_rl by facebookresearch

Explore Similar Projects

Awesome-Agent-RL by 0russwest0

multiagent-coaching by ltjed

Awesome-RL-based-LLM-Reasoning by bruno686

AgentsMeetRL by thinkwee

Open-AgentRL by Gen-Verse

verl-tool by TIGER-AI-Lab

Agentic-Reasoning by human-re

AgentTuning by THUDM

KwaiAgents by KwaiKEG

Awesome-AI-Agents by Jenqyang

Awesome-LLM-Post-training by mbzuai-oryx

awesome-llm-powered-agent by hyp1231