skyvern  by Skyvern-AI

CLI tool for browser automation using LLMs and computer vision

Created 1 year ago
14,268 stars

Top 3.5% on SourcePulse

GitHubView on GitHub
Project Summary

Skyvern automates browser-based workflows using LLMs and computer vision, targeting users who need to replace brittle, DOM-dependent automation scripts. It offers a more robust and adaptable approach by leveraging visual understanding and natural language prompts to interact with websites, enabling zero-shot automation on unseen sites and resilience to UI changes.

How It Works

Skyvern employs a swarm of specialized agents inspired by autonomous agent designs. Key agents include: Interactable Element Agent for parsing HTML and identifying interactive elements, Navigation Agent for planning and executing actions like clicks and text input, and Data Extraction Agent for structured data retrieval. This multi-agent system, combined with LLM reasoning, allows Skyvern to comprehend complex interactions and adapt to dynamic web content without pre-defined selectors.

Quick Start & Requirements

  • Install: pip install skyvern
  • Prerequisites: Python 3.11.
  • Setup: Run skyvern init for configuration, then skyvern run server and skyvern run ui.
  • Docker: Clone repo, configure docker-compose.yml with LLM keys, run docker compose up -d. Access UI at http://localhost:8080.
  • Docs: https://docs.skyvern.com/

Highlighted Details

  • Leverages LLMs for reasoning, enabling complex task completion and cross-website workflow application.
  • Supports a wide range of LLM providers including OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, and Novita AI.
  • Offers livestreaming of browser viewports for debugging and intervention.
  • Features include workflow chaining, file downloading, form filling, and beta authentication support (Bitwarden integration).

Maintenance & Community

  • Active development with a public roadmap.
  • Community support via Discord and email.
  • Contributions are welcomed via PRs and issues.

Licensing & Compatibility

  • Licensed under AGPL-3.0.
  • AGPL-3.0 is a strong copyleft license, requiring derivative works to also be open-sourced under the same license. Commercial use or linking with closed-source applications may require a separate commercial license or careful consideration of AGPL obligations.

Limitations & Caveats

  • Local setup is primarily tested on macOS.
  • AGPL-3.0 license may impose significant restrictions on commercial or closed-source integration.
  • Some advanced features like conditionals and custom code blocks are marked as "coming soon."
Health Check
Last Commit

12 hours ago

Responsiveness

1 day

Pull Requests (30d)
209
Issues (30d)
26
Star History
223 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.