Discover and explore top open-source AI tools and projects—updated daily.
facebookresearchLLM agents trained for collaborative reasoning
Top 99.6% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> This repository provides the official implementation for SWEET-RL and the ColBench benchmark, addressing the challenge of effective credit assignment in multi-turn LLM agent interactions. It targets researchers and developers building collaborative LLM agents, offering a novel RL algorithm that significantly enhances performance on complex reasoning tasks, enabling smaller models to rival state-of-the-art systems.
How It Works
SWEET-RL employs a novel critic model trained with additional training-time information to provide step-level rewards. This approach improves credit assignment over multiple turns, a key limitation in prior multi-turn RL methods. The framework builds on a custom fork of openrlhf, incorporating multi-turn Direct Preference Optimization (DPO) and length normalization to optimize LLM agents for collaborative reasoning.
Quick Start & Requirements
Installation involves cloning the repository and installing dependencies via pip install -e .. Setting up the environment requires specific prerequisites depending on the task:
facebook/collaborative_agent_bench dataset.HuggingFaceM4/WebSight dataset.
Both tasks require significant GPU resources for the VLLM servers. The associated paper can be found at arXiv:2503.15478.Highlighted Details
openrlhf fork supporting multi-turn DPO and length normalization.Maintenance & Community
No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmap are provided in the README excerpt.
Licensing & Compatibility
The project is released under the CC-By-NC (Creative Commons Attribution-NonCommercial) license. This license restricts usage to non-commercial purposes, impacting its compatibility with commercial applications or closed-source projects requiring commercial use.
Limitations & Caveats
The setup process is complex, requiring the deployment of VLLM servers and specific model configurations. The CC-By-NC license strictly prohibits commercial use. Further details on known bugs, alpha status, or deprecations are not specified in the provided documentation.
6 months ago
Inactive
KhoomeiK
THUDM
langchain-ai