WebRL  by THUDM

Framework for training LLM web agents using reinforcement learning

created 9 months ago
430 stars

Top 70.1% on sourcepulse

GitHubView on GitHub
Project Summary

WebRL provides a framework for training Large Language Model (LLM) web agents using a self-evolving online curriculum reinforcement learning technique. It targets the WebArena environment, enabling agents to learn complex web navigation and task completion skills. The project is suitable for researchers and developers focused on autonomous agents and LLM-powered web interaction.

How It Works

WebRL employs a reinforcement learning approach where the agent learns to interact with web environments. A key innovation is the self-evolving online curriculum, which dynamically adjusts the learning tasks to optimize agent progress. This curriculum generation, coupled with an Outcome-supervised Reward Model (ORM), allows for efficient and effective training of web agents.

Quick Start & Requirements

  • Install: Create a conda environment (python==3.10), activate it, cd WebRL, and run pip install -e ..
  • Dependencies: Python 3.10, Conda.
  • Models: Requires downloading pre-trained checkpoints for Actor (e.g., WebRL-GLM-4-9B, WebRL-LLaMA-3.1-8B) and ORM (e.g., ORM-Llama-3.1-8B).
  • Environment: Interaction with WebArena is facilitated via VAB-WebArena-Lite.
  • Docs: 📃 Paper

Highlighted Details

  • Supports multiple LLM backbones including GLM and LLaMA variants.
  • Includes scripts for SFT baseline training using LLaMA-Factory.
  • Provides detailed procedures for generating new instructions and processing interaction data with reward labeling.

Maintenance & Community

  • The project is associated with THUDM.
  • Links to model checkpoints are provided on Hugging Face and ModelScope.

Licensing & Compatibility

  • The README does not explicitly state the license. However, the citation indicates it's an arXiv preprint, suggesting potential for future licensing decisions. Compatibility for commercial use is not specified.

Limitations & Caveats

The project is presented as a research preprint, and its stability, long-term maintenance, and production readiness are not yet established. Specific hardware requirements for training and inference are not detailed.

Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
61 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.