Autonomous browsing agent for state-of-the-art web task completion
Top 85.0% on SourcePulse
Meka Agent is an open-source, autonomous agent designed for state-of-the-art web browsing and computer interaction. It aims to mimic human-like interaction by relying purely on visual input and operating within a full computer context, making it suitable for researchers and developers building complex automation workflows.
How It Works
Meka Agent utilizes a vision-centric approach, processing visual information from the computer environment to understand and act. It supports a flexible architecture, allowing users to integrate various Large Language Models (LLMs) with strong visual grounding (e.g., OpenAI o3, Claude Sonnet 4, Claude Opus 4) and infrastructure providers that offer OS-level controls beyond browser screenshots. This OS-level access is crucial for interacting with elements like dropdowns, alerts, and file uploads, which are often rendered at the system level.
Quick Start & Requirements
npm install @trymeka/core @trymeka/ai-provider-vercel @ai-sdk/openai @trymeka/computer-provider-anchor-browser playwright-core
.env
file with API keys.Highlighted Details
Maintenance & Community
The project is open-source with a call for contributions. Links to contributing guidelines are available.
Licensing & Compatibility
Licensed under the MIT License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
The primary infrastructure provider mentioned is Anchor Browser, suggesting potential vendor lock-in or a need for specific VM-based environments for full functionality. While other providers are welcome, extensive testing is noted for OpenAI and Claude models.
5 days ago
Inactive