DeepResearch by Alibaba-NLP

Benchmark for LLMs in web traversal

Created 1 year ago

17,892 stars

Top 2.6% on SourcePulse

View on GitHub

6 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Yaowei Zheng

Author of LLaMA-Factory

and 2 more!

Project Summary

This repository introduces WebWalker, a benchmark and framework for evaluating Large Language Models (LLMs) in complex web traversal tasks. It addresses the challenge of long-context information seeking by proposing a multi-agent approach for effective memory management, targeting researchers and developers working on LLM-powered web navigation and information retrieval.

How It Works

WebWalker utilizes a multi-agent framework designed for efficient memory management in long-context web traversal. This approach aims to overcome the limitations of standard LLMs in handling extended interactions and complex navigation paths across multiple web pages. The system leverages external tools and libraries like ReACT, Qwen-Agents, and LangChain to facilitate agent interactions and web crawling.

Quick Start & Requirements

Installation: Clone the repository, create a conda environment (conda create -n webwalker python=3.10), install requirements (pip install -r requirements.txt), and run post-installation setup (crawl4ai-setup, crawl4ai-doctor).
Prerequisites: Python 3.10, OpenAI or Dashscope API key (export as environment variable).
Running Locally: cd src then streamlit run app.py.
Resources: Requires API keys for LLM access.
Demos: Modelscope online demo, Huggingface online demo
Dataset: WebWalkerQA dataset

Highlighted Details

Introduces the WebWalkerQA benchmark dataset with 680 queries across 1373 webpages.
Proposes a multi-agent framework for improved memory management in web traversal.
Supports evaluation using GPT-4 for answer accuracy.
Leverages Crawl4AI for web page data acquisition in a Markdown-like format.

Maintenance & Community

The project is contributed by Jialong Wu (jialongwu@alibaba-inc.com, jialongwu@seu.edu.cn). The project acknowledges contributions from ReACT, Qwen-Agents, and LangChain.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README mentions a collection of approximately 14k "silver QA pairs" which are not yet human-verified. The project is associated with a preprint, indicating it may be under active development.

Health Check

Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

370 stars in the last 30 days