Llama-3 agent for web browsing via instructions and dialogue
Top 29.4% on sourcepulse
WebLlama provides a framework for building and evaluating web-browsing agents powered by Meta's Llama 3 models. It targets researchers and developers aiming to create human-centric AI assistants for navigating the web through instructions and dialogue, offering a fine-tuned Llama-3-8B model that demonstrates superior performance on web navigation benchmarks compared to GPT-4V.
How It Works
The project fine-tunes Llama 3 models on the WebLINX dataset, which comprises over 24,000 curated instances of web interactions including clicks, text inputs, and dialogue acts. This approach leverages large language models for understanding complex instructions and generating sequential actions for web navigation, aiming for more natural and effective human-AI collaboration in web browsing tasks.
Quick Start & Requirements
transformers
, datasets
, and huggingface_hub
libraries.Llama-3-8B-Web
model and WebLINX
dataset from Hugging Face Hub.Highlighted Details
Llama-3-8B-Web
model fine-tuned on 24K WebLINX 1.0 instances.Maintenance & Community
The project is associated with McGill University's NLP research. Contributions are welcomed via GitHub issues.
Licensing & Compatibility
The code is licensed under MIT. Models and data have their own licenses specified on their respective Hugging Face pages.
Limitations & Caveats
The primary model is based on Llama-3-8B, which may have significant computational requirements. While evaluated on 150 websites, generalization to unseen or rapidly changing web content may vary. The project is actively developing new data and evaluation methods.
7 months ago
Inactive