Benchmark for LLMs in web traversal
Top 9.4% on sourcepulse
This repository introduces WebWalker, a benchmark and framework for evaluating Large Language Models (LLMs) in complex web traversal tasks. It addresses the challenge of long-context information seeking by proposing a multi-agent approach for effective memory management, targeting researchers and developers working on LLM-powered web navigation and information retrieval.
How It Works
WebWalker utilizes a multi-agent framework designed for efficient memory management in long-context web traversal. This approach aims to overcome the limitations of standard LLMs in handling extended interactions and complex navigation paths across multiple web pages. The system leverages external tools and libraries like ReACT, Qwen-Agents, and LangChain to facilitate agent interactions and web crawling.
Quick Start & Requirements
conda create -n webwalker python=3.10
), install requirements (pip install -r requirements.txt
), and run post-installation setup (crawl4ai-setup
, crawl4ai-doctor
).cd src
then streamlit run app.py
.Highlighted Details
Maintenance & Community
The project is contributed by Jialong Wu (jialongwu@alibaba-inc.com, jialongwu@seu.edu.cn). The project acknowledges contributions from ReACT, Qwen-Agents, and LangChain.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README mentions a collection of approximately 14k "silver QA pairs" which are not yet human-verified. The project is associated with a preprint, indicating it may be under active development.
1 week ago
1 day