skyvern  by Skyvern-AI

CLI tool for browser automation using LLMs and computer vision

created 1 year ago
13,970 stars

Top 3.6% on sourcepulse

GitHubView on GitHub
Project Summary

Skyvern automates browser-based workflows using LLMs and computer vision, targeting users who need to replace brittle, DOM-dependent automation scripts. It offers a more robust and adaptable approach by leveraging visual understanding and natural language prompts to interact with websites, enabling zero-shot automation on unseen sites and resilience to UI changes.

How It Works

Skyvern employs a swarm of specialized agents inspired by autonomous agent designs. Key agents include: Interactable Element Agent for parsing HTML and identifying interactive elements, Navigation Agent for planning and executing actions like clicks and text input, and Data Extraction Agent for structured data retrieval. This multi-agent system, combined with LLM reasoning, allows Skyvern to comprehend complex interactions and adapt to dynamic web content without pre-defined selectors.

Quick Start & Requirements

  • Install: pip install skyvern
  • Prerequisites: Python 3.11.
  • Setup: Run skyvern init for configuration, then skyvern run server and skyvern run ui.
  • Docker: Clone repo, configure docker-compose.yml with LLM keys, run docker compose up -d. Access UI at http://localhost:8080.
  • Docs: https://docs.skyvern.com/

Highlighted Details

  • Leverages LLMs for reasoning, enabling complex task completion and cross-website workflow application.
  • Supports a wide range of LLM providers including OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, and Novita AI.
  • Offers livestreaming of browser viewports for debugging and intervention.
  • Features include workflow chaining, file downloading, form filling, and beta authentication support (Bitwarden integration).

Maintenance & Community

  • Active development with a public roadmap.
  • Community support via Discord and email.
  • Contributions are welcomed via PRs and issues.

Licensing & Compatibility

  • Licensed under AGPL-3.0.
  • AGPL-3.0 is a strong copyleft license, requiring derivative works to also be open-sourced under the same license. Commercial use or linking with closed-source applications may require a separate commercial license or careful consideration of AGPL obligations.

Limitations & Caveats

  • Local setup is primarily tested on macOS.
  • AGPL-3.0 license may impose significant restrictions on commercial or closed-source integration.
  • Some advanced features like conditionals and custom code blocks are marked as "coming soon."
Health Check
Last commit

19 hours ago

Responsiveness

Inactive

Pull Requests (30d)
210
Issues (30d)
13
Star History
818 stars in the last 90 days

Explore Similar Projects

Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
1 more.

SuperAGI by TransformerOptimus

0.2%
17k
Open-source framework for autonomous AI agent development
created 2 years ago
updated 6 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
1 more.

SillyTavern by SillyTavern

3.2%
17k
LLM frontend for power users
created 2 years ago
updated 2 days ago
Feedback? Help us improve.