skyvern by Skyvern-AI

CLI tool for browser automation using LLMs and computer vision

Created 1 year ago

20,072 stars

Top 2.2% on SourcePulse

View on GitHub

15 Experts Love This Project

Amanpreet Singh

Cofounder of Contextual AI

Elie Bursztein

Cybersecurity Lead at Google DeepMind

Dan Guido

Cofounder of Trail of Bits

Jordan Burgess

Cofounder of Humanloop

and 11 more!

Project Summary

Skyvern automates browser-based workflows using LLMs and computer vision, targeting users who need to replace brittle, DOM-dependent automation scripts. It offers a more robust and adaptable approach by leveraging visual understanding and natural language prompts to interact with websites, enabling zero-shot automation on unseen sites and resilience to UI changes.

How It Works

Skyvern employs a swarm of specialized agents inspired by autonomous agent designs. Key agents include: Interactable Element Agent for parsing HTML and identifying interactive elements, Navigation Agent for planning and executing actions like clicks and text input, and Data Extraction Agent for structured data retrieval. This multi-agent system, combined with LLM reasoning, allows Skyvern to comprehend complex interactions and adapt to dynamic web content without pre-defined selectors.

Quick Start & Requirements

Install: pip install skyvern
Prerequisites: Python 3.11.
Setup: Run skyvern init for configuration, then skyvern run server and skyvern run ui.
Docker: Clone repo, configure docker-compose.yml with LLM keys, run docker compose up -d. Access UI at http://localhost:8080.
Docs: https://docs.skyvern.com/

Highlighted Details

Leverages LLMs for reasoning, enabling complex task completion and cross-website workflow application.
Supports a wide range of LLM providers including OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, and Novita AI.
Offers livestreaming of browser viewports for debugging and intervention.
Features include workflow chaining, file downloading, form filling, and beta authentication support (Bitwarden integration).

Maintenance & Community

Active development with a public roadmap.
Community support via Discord and email.
Contributions are welcomed via PRs and issues.

Licensing & Compatibility

Licensed under AGPL-3.0.
AGPL-3.0 is a strong copyleft license, requiring derivative works to also be open-sourced under the same license. Commercial use or linking with closed-source applications may require a separate commercial license or careful consideration of AGPL obligations.

Limitations & Caveats

Local setup is primarily tested on macOS.
AGPL-3.0 license may impose significant restrictions on commercial or closed-source integration.
Some advanced features like conditionals and custom code blocks are marked as "coming soon."

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

144

Issues (30d)

Star History

427 stars in the last 30 days