webllama  by McGill-NLP

Llama-3 agent for web browsing via instructions and dialogue

created 1 year ago
1,410 stars

Top 29.4% on sourcepulse

GitHubView on GitHub
Project Summary

WebLlama provides a framework for building and evaluating web-browsing agents powered by Meta's Llama 3 models. It targets researchers and developers aiming to create human-centric AI assistants for navigating the web through instructions and dialogue, offering a fine-tuned Llama-3-8B model that demonstrates superior performance on web navigation benchmarks compared to GPT-4V.

How It Works

The project fine-tunes Llama 3 models on the WebLINX dataset, which comprises over 24,000 curated instances of web interactions including clicks, text inputs, and dialogue acts. This approach leverages large language models for understanding complex instructions and generating sequential actions for web navigation, aiming for more natural and effective human-AI collaboration in web browsing tasks.

Quick Start & Requirements

  • Install: Use Hugging Face's transformers, datasets, and huggingface_hub libraries.
  • Prerequisites: Python, Hugging Face libraries.
  • Resources: Requires access to the Llama-3-8B-Web model and WebLINX dataset from Hugging Face Hub.
  • Docs: Homepage

Highlighted Details

  • Llama-3-8B-Web model fine-tuned on 24K WebLINX 1.0 instances.
  • Outperforms GPT-4V by 18% (zero-shot) on the WebLINX 1.0 benchmark.
  • Includes the WebLINX 1.0 benchmark with 150 websites across various domains.
  • Provides code for fine-tuning, evaluation, and integration with Playwright and BrowserGym.

Maintenance & Community

The project is associated with McGill University's NLP research. Contributions are welcomed via GitHub issues.

Licensing & Compatibility

The code is licensed under MIT. Models and data have their own licenses specified on their respective Hugging Face pages.

Limitations & Caveats

The primary model is based on Llama-3-8B, which may have significant computational requirements. While evaluated on 150 websites, generalization to unseen or rapidly changing web content may vary. The project is actively developing new data and evaluation methods.

Health Check
Last commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.