CLI tool for browser automation using LLMs and computer vision
Top 3.6% on sourcepulse
Skyvern automates browser-based workflows using LLMs and computer vision, targeting users who need to replace brittle, DOM-dependent automation scripts. It offers a more robust and adaptable approach by leveraging visual understanding and natural language prompts to interact with websites, enabling zero-shot automation on unseen sites and resilience to UI changes.
How It Works
Skyvern employs a swarm of specialized agents inspired by autonomous agent designs. Key agents include: Interactable Element Agent for parsing HTML and identifying interactive elements, Navigation Agent for planning and executing actions like clicks and text input, and Data Extraction Agent for structured data retrieval. This multi-agent system, combined with LLM reasoning, allows Skyvern to comprehend complex interactions and adapt to dynamic web content without pre-defined selectors.
Quick Start & Requirements
pip install skyvern
skyvern init
for configuration, then skyvern run server
and skyvern run ui
.docker-compose.yml
with LLM keys, run docker compose up -d
. Access UI at http://localhost:8080
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
19 hours ago
Inactive