sweet_rl  by facebookresearch

LLM agents trained for collaborative reasoning

Created 8 months ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This repository provides the official implementation for SWEET-RL and the ColBench benchmark, addressing the challenge of effective credit assignment in multi-turn LLM agent interactions. It targets researchers and developers building collaborative LLM agents, offering a novel RL algorithm that significantly enhances performance on complex reasoning tasks, enabling smaller models to rival state-of-the-art systems.

How It Works

SWEET-RL employs a novel critic model trained with additional training-time information to provide step-level rewards. This approach improves credit assignment over multiple turns, a key limitation in prior multi-turn RL methods. The framework builds on a custom fork of openrlhf, incorporating multi-turn Direct Preference Optimization (DPO) and length normalization to optimize LLM agents for collaborative reasoning.

Quick Start & Requirements

Installation involves cloning the repository and installing dependencies via pip install -e .. Setting up the environment requires specific prerequisites depending on the task:

  • Backend Programming: A VLLM server simulating a human collaborator (e.g., using Llama-3.1-70B-Instruct) and the facebook/collaborative_agent_bench dataset.
  • Frontend Design: GeckoDriver and Firefox, a VLLM server with a vision model (e.g., Qwen2-VL-72B-Instruct), and the HuggingFaceM4/WebSight dataset. Both tasks require significant GPU resources for the VLLM servers. The associated paper can be found at arXiv:2503.15478.

Highlighted Details

  • SWEET-RL achieves a 6% absolute improvement in success and win rates on the ColBench benchmark.
  • It enables Llama-3.1-8B to match or exceed GPT4-o performance in realistic collaborative content creation tasks.
  • Introduces the ColBench benchmark, featuring backend programming and frontend design scenarios.
  • Leverages a custom openrlhf fork supporting multi-turn DPO and length normalization.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmap are provided in the README excerpt.

Licensing & Compatibility

The project is released under the CC-By-NC (Creative Commons Attribution-NonCommercial) license. This license restricts usage to non-commercial purposes, impacting its compatibility with commercial applications or closed-source projects requiring commercial use.

Limitations & Caveats

The setup process is complex, requiring the deployment of VLLM servers and specific model configurations. The CC-By-NC license strictly prohibits commercial use. Further details on known bugs, alpha status, or deprecations are not specified in the provided documentation.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.