webllama  by McGill-NLP

Llama-3 agent for web browsing via instructions and dialogue

Created 1 year ago
1,411 stars

Top 28.8% on SourcePulse

GitHubView on GitHub
Project Summary

WebLlama provides a framework for building and evaluating web-browsing agents powered by Meta's Llama 3 models. It targets researchers and developers aiming to create human-centric AI assistants for navigating the web through instructions and dialogue, offering a fine-tuned Llama-3-8B model that demonstrates superior performance on web navigation benchmarks compared to GPT-4V.

How It Works

The project fine-tunes Llama 3 models on the WebLINX dataset, which comprises over 24,000 curated instances of web interactions including clicks, text inputs, and dialogue acts. This approach leverages large language models for understanding complex instructions and generating sequential actions for web navigation, aiming for more natural and effective human-AI collaboration in web browsing tasks.

Quick Start & Requirements

  • Install: Use Hugging Face's transformers, datasets, and huggingface_hub libraries.
  • Prerequisites: Python, Hugging Face libraries.
  • Resources: Requires access to the Llama-3-8B-Web model and WebLINX dataset from Hugging Face Hub.
  • Docs: Homepage

Highlighted Details

  • Llama-3-8B-Web model fine-tuned on 24K WebLINX 1.0 instances.
  • Outperforms GPT-4V by 18% (zero-shot) on the WebLINX 1.0 benchmark.
  • Includes the WebLINX 1.0 benchmark with 150 websites across various domains.
  • Provides code for fine-tuning, evaluation, and integration with Playwright and BrowserGym.

Maintenance & Community

The project is associated with McGill University's NLP research. Contributions are welcomed via GitHub issues.

Licensing & Compatibility

The code is licensed under MIT. Models and data have their own licenses specified on their respective Hugging Face pages.

Limitations & Caveats

The primary model is based on Llama-3-8B, which may have significant computational requirements. While evaluated on 150 websites, generalization to unseen or rapidly changing web content may vary. The project is actively developing new data and evaluation methods.

Health Check
Last Commit

9 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Gregor Zunic Gregor Zunic(Cofounder of Browser Use), and
1 more.

BrowserGym by ServiceNow

0.8%
895
Gym environment for web task automation research
Created 1 year ago
Updated 1 day ago
Starred by Kevin Hou Kevin Hou(Head of Product Engineering at Windsurf), Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), and
29 more.

browser-use by browser-use

0.6%
70k
SDK for AI agent browser control
Created 10 months ago
Updated 1 day ago
Feedback? Help us improve.