webllama by McGill-NLP

Llama-3 agent for web browsing via instructions and dialogue

Created 1 year ago

1,408 stars

Top 28.6% on SourcePulse

Project Summary

WebLlama provides a framework for building and evaluating web-browsing agents powered by Meta's Llama 3 models. It targets researchers and developers aiming to create human-centric AI assistants for navigating the web through instructions and dialogue, offering a fine-tuned Llama-3-8B model that demonstrates superior performance on web navigation benchmarks compared to GPT-4V.

How It Works

The project fine-tunes Llama 3 models on the WebLINX dataset, which comprises over 24,000 curated instances of web interactions including clicks, text inputs, and dialogue acts. This approach leverages large language models for understanding complex instructions and generating sequential actions for web navigation, aiming for more natural and effective human-AI collaboration in web browsing tasks.

Quick Start & Requirements

Install: Use Hugging Face's transformers, datasets, and huggingface_hub libraries.
Prerequisites: Python, Hugging Face libraries.
Resources: Requires access to the Llama-3-8B-Web model and WebLINX dataset from Hugging Face Hub.
Docs: Homepage

Highlighted Details

Llama-3-8B-Web model fine-tuned on 24K WebLINX 1.0 instances.
Outperforms GPT-4V by 18% (zero-shot) on the WebLINX 1.0 benchmark.
Includes the WebLINX 1.0 benchmark with 150 websites across various domains.
Provides code for fine-tuning, evaluation, and integration with Playwright and BrowserGym.

Maintenance & Community

The project is associated with McGill University's NLP research. Contributions are welcomed via GitHub issues.

Licensing & Compatibility

The code is licensed under MIT. Models and data have their own licenses specified on their respective Hugging Face pages.

Limitations & Caveats

The primary model is based on Llama-3-8B, which may have significant computational requirements. While evaluated on 150 websites, generalization to unseen or rapidly changing web content may vary. The project is actively developing new data and evaluation methods.

webllama by McGill-NLP

Explore Similar Projects

agent-browse by browserbase

miniwob-plusplus by Farama-Foundation

browserbee by parsaghaffari

browserable by browserable

SeeAct by OSU-NLP-Group

surf.new by steel-dev

browsernode by leoning60

WebVoyager by MinorJerry

BrowserGym by ServiceNow

notte by nottelabs

webarena by web-arena-x

browser-use by browser-use