skyvern  by Skyvern-AI

CLI tool for browser automation using LLMs and computer vision

Created 2 years ago
20,520 stars

Top 2.2% on SourcePulse

GitHubView on GitHub
Project Summary

Skyvern automates browser-based workflows using LLMs and computer vision, targeting users who need to replace brittle, DOM-dependent automation scripts. It offers a more robust and adaptable approach by leveraging visual understanding and natural language prompts to interact with websites, enabling zero-shot automation on unseen sites and resilience to UI changes.

How It Works

Skyvern employs a swarm of specialized agents inspired by autonomous agent designs. Key agents include: Interactable Element Agent for parsing HTML and identifying interactive elements, Navigation Agent for planning and executing actions like clicks and text input, and Data Extraction Agent for structured data retrieval. This multi-agent system, combined with LLM reasoning, allows Skyvern to comprehend complex interactions and adapt to dynamic web content without pre-defined selectors.

Quick Start & Requirements

  • Install: pip install skyvern
  • Prerequisites: Python 3.11.
  • Setup: Run skyvern init for configuration, then skyvern run server and skyvern run ui.
  • Docker: Clone repo, configure docker-compose.yml with LLM keys, run docker compose up -d. Access UI at http://localhost:8080.
  • Docs: https://docs.skyvern.com/

Highlighted Details

  • Leverages LLMs for reasoning, enabling complex task completion and cross-website workflow application.
  • Supports a wide range of LLM providers including OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, and Novita AI.
  • Offers livestreaming of browser viewports for debugging and intervention.
  • Features include workflow chaining, file downloading, form filling, and beta authentication support (Bitwarden integration).

Maintenance & Community

  • Active development with a public roadmap.
  • Community support via Discord and email.
  • Contributions are welcomed via PRs and issues.

Licensing & Compatibility

  • Licensed under AGPL-3.0.
  • AGPL-3.0 is a strong copyleft license, requiring derivative works to also be open-sourced under the same license. Commercial use or linking with closed-source applications may require a separate commercial license or careful consideration of AGPL obligations.

Limitations & Caveats

  • Local setup is primarily tested on macOS.
  • AGPL-3.0 license may impose significant restrictions on commercial or closed-source integration.
  • Some advanced features like conditionals and custom code blocks are marked as "coming soon."
Health Check
Last Commit

17 hours ago

Responsiveness

1 day

Pull Requests (30d)
317
Issues (30d)
27
Star History
358 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Abubakar Abid Abubakar Abid(Cofounder of Gradio), and
3 more.

owl by camel-ai

0.2%
19k
Multi-agent framework for real-world task automation
Created 11 months ago
Updated 1 day ago
Feedback? Help us improve.