OpenManus-RL by OpenManus

RL tuning framework for LLM agents, inspired by Deepseek-R1

Created 10 months ago

3,781 stars

Top 12.7% on SourcePulse

Project Summary

OpenManus-RL is an open-source initiative focused on advancing Reinforcement Learning (RL) tuning for Large Language Model (LLM) agents. It aims to improve agent reasoning, decision-making, and tool integration capabilities, targeting researchers and developers in the LLM agent space. The project offers a platform for live-streamed development, sharing progress, datasets, and tuned models for benchmarks like GAIA and AgentBench.

How It Works

The project employs an RL-based tuning framework, inspired by RAGEN's RICO, to enhance LLM agents. It explores novel algorithmic structures, diverse reasoning paradigms (e.g., ReAct, Outcome-based), and sophisticated reward strategies (e.g., format-based, outcome-based). The approach integrates various rollout strategies like Tree-of-Thoughts and Monte Carlo Tree Search, and post-training methods including SFT, GRPO, PPO, and DPO, aiming for robust agent performance across multiple benchmarks.

Quick Start & Requirements

Installation: pip install -e . (after creating and activating a conda environment with Python 3.11).
Prerequisites: PyTorch with CUDA 12.1 (torch==2.4.0), vllm (vllm==0.6.3), flash-attn, wandb. Specific environments like WebShop require additional setup (environment.yml, ./setup.sh).
Resources: Requires GPU with CUDA 12.1 support. WebShop setup involves launching a server on port 36001.
Links: Huggingface Dataset, AgentGym, Verl.

Highlighted Details

Open-sources an Agent SFT dataset on Huggingface.
Combines datasets from AgentInstruct, Agent-FLAN, and AgentTraj-L for over 50,000 trajectories.
Supports multiple reasoning models (GPT-O1, Deepseek-R1, QwQ-32B) and rollout strategies (ToT, GoT, DFSDT, MCTS).
Integrates with RL tuning frameworks like Verl, TinyZero, OpenR1, and Trlx.

Maintenance & Community

Collaborative effort between Ulab-UIUC and MetaGPT. Community contributions are welcomed via issues and pull requests. Contact: kunlunz2@illinois.edu.

Licensing & Compatibility

The project is hosted on GitHub and cites a custom MIT-like license for its own code. However, it integrates with other projects and datasets, whose licenses should be independently verified for commercial use or closed-source linking.

Limitations & Caveats

The README states that the library for SFT and GRPO tuning is still under development. The "Code and dataset coming soon!" note from the initial description suggests potential incompleteness or ongoing development of core components.

Health Check

Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

121 stars in the last 30 days