sweet_rl  by facebookresearch

LLM agents trained for collaborative reasoning

Created 1 year ago
264 stars

Top 96.7% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This repository provides the official implementation for SWEET-RL and the ColBench benchmark, addressing the challenge of effective credit assignment in multi-turn LLM agent interactions. It targets researchers and developers building collaborative LLM agents, offering a novel RL algorithm that significantly enhances performance on complex reasoning tasks, enabling smaller models to rival state-of-the-art systems.

How It Works

SWEET-RL employs a novel critic model trained with additional training-time information to provide step-level rewards. This approach improves credit assignment over multiple turns, a key limitation in prior multi-turn RL methods. The framework builds on a custom fork of openrlhf, incorporating multi-turn Direct Preference Optimization (DPO) and length normalization to optimize LLM agents for collaborative reasoning.

Quick Start & Requirements

Installation involves cloning the repository and installing dependencies via pip install -e .. Setting up the environment requires specific prerequisites depending on the task:

  • Backend Programming: A VLLM server simulating a human collaborator (e.g., using Llama-3.1-70B-Instruct) and the facebook/collaborative_agent_bench dataset.
  • Frontend Design: GeckoDriver and Firefox, a VLLM server with a vision model (e.g., Qwen2-VL-72B-Instruct), and the HuggingFaceM4/WebSight dataset. Both tasks require significant GPU resources for the VLLM servers. The associated paper can be found at arXiv:2503.15478.

Highlighted Details

  • SWEET-RL achieves a 6% absolute improvement in success and win rates on the ColBench benchmark.
  • It enables Llama-3.1-8B to match or exceed GPT4-o performance in realistic collaborative content creation tasks.
  • Introduces the ColBench benchmark, featuring backend programming and frontend design scenarios.
  • Leverages a custom openrlhf fork supporting multi-turn DPO and length normalization.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmap are provided in the README excerpt.

Licensing & Compatibility

The project is released under the CC-By-NC (Creative Commons Attribution-NonCommercial) license. This license restricts usage to non-commercial purposes, impacting its compatibility with commercial applications or closed-source projects requiring commercial use.

Limitations & Caveats

The setup process is complex, requiring the deployment of VLLM servers and specific model configurations. The CC-By-NC license strictly prohibits commercial use. Further details on known bugs, alpha status, or deprecations are not specified in the provided documentation.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.