Web-Use  by CursorTouch

LLM-powered autonomous browser agent for web task automation

Created 1 year ago
256 stars

Top 98.5% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Web-Use is an intelligent, autonomous browser agent leveraging the Chrome DevTools Protocol (CDP) and multiple Large Language Models (LLMs) to automate complex web interactions. It targets developers and power users seeking to streamline tasks like navigation, form filling, smart searching, and file operations, significantly boosting productivity.

How It Works

The agent utilizes CDP for direct browser control and integrates with a diverse range of LLMs for reasoning. Its core innovation lies in constructing a "Semantic Tree" directly from the DOM, providing accurate structural context via CSS selectors and roles, rather than relying on brittle XPaths. It also supports the Web Model Context Protocol (WebMCP) for dynamic discovery and use of website-specific tools.

Quick Start & Requirements

  • Prerequisites: Python 3.11+.
  • Installation: Clone the repository, navigate to the directory, and run uv sync to install dependencies.
  • Configuration: Set up a .env file with necessary API keys (e.g., GOOGLE_API_KEY).
  • Execution: Run the main script using uv run main.py.
  • Documentation: Refer to the CONTRIBUTING file for development guidelines.

Highlighted Details

  • Multi-LLM Support: Integrates with numerous LLMs including OpenAI, Gemini, Claude, Groq, Ollama, Mistral, and Cerebras.
  • Vision Capability: Employs scroll-aware bounding boxes for accurate visual understanding of web page elements.
  • Semantic Tree: Generates a detailed, real DOM-based tree representation for robust element identification.
  • OAuth 2.0 + PKCE: Built-in support for secure, passwordless authentication with persistent token storage.
  • Web Model Context Protocol (WebMCP): Enables automatic discovery and integration of custom tools exposed by websites.

Maintenance & Community

The project is primarily maintained by Jeomon George and Muhammad Yaseen. Specific community channels (like Discord/Slack) or a public roadmap are not detailed in the README.

Licensing & Compatibility

Licensed under the MIT License, permitting broad use, modification, and distribution, including for commercial purposes and linking within closed-source applications.

Limitations & Caveats

The agent operates with defined max_steps and max_consecutive_failures, indicating potential timeouts or execution limits. Advanced features like use_system_profile may require specific user environment setup. API key management is essential for certain LLM integrations.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
24 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.