WebAgent  by Alibaba-NLP

Benchmark for LLMs in web traversal

created 6 months ago
5,436 stars

Top 9.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository introduces WebWalker, a benchmark and framework for evaluating Large Language Models (LLMs) in complex web traversal tasks. It addresses the challenge of long-context information seeking by proposing a multi-agent approach for effective memory management, targeting researchers and developers working on LLM-powered web navigation and information retrieval.

How It Works

WebWalker utilizes a multi-agent framework designed for efficient memory management in long-context web traversal. This approach aims to overcome the limitations of standard LLMs in handling extended interactions and complex navigation paths across multiple web pages. The system leverages external tools and libraries like ReACT, Qwen-Agents, and LangChain to facilitate agent interactions and web crawling.

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment (conda create -n webwalker python=3.10), install requirements (pip install -r requirements.txt), and run post-installation setup (crawl4ai-setup, crawl4ai-doctor).
  • Prerequisites: Python 3.10, OpenAI or Dashscope API key (export as environment variable).
  • Running Locally: cd src then streamlit run app.py.
  • Resources: Requires API keys for LLM access.
  • Demos: Modelscope online demo, Huggingface online demo
  • Dataset: WebWalkerQA dataset

Highlighted Details

  • Introduces the WebWalkerQA benchmark dataset with 680 queries across 1373 webpages.
  • Proposes a multi-agent framework for improved memory management in web traversal.
  • Supports evaluation using GPT-4 for answer accuracy.
  • Leverages Crawl4AI for web page data acquisition in a Markdown-like format.

Maintenance & Community

The project is contributed by Jialong Wu (jialongwu@alibaba-inc.com, jialongwu@seu.edu.cn). The project acknowledges contributions from ReACT, Qwen-Agents, and LangChain.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README mentions a collection of approximately 14k "silver QA pairs" which are not yet human-verified. The project is associated with a preprint, indicating it may be under active development.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
10
Issues (30d)
35
Star History
5,257 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.