Discover and explore top open-source AI tools and projects—updated daily.
allenaiAutonomous multimodal web agent SDK
New!
Top 64.1% on SourcePulse
MolmoWeb is an open multimodal web agent designed for autonomous web navigation and task completion. It empowers researchers and developers to automate complex interactions with web browsers, such as clicking, typing, and scrolling, driven by natural language prompts. The project provides the agent code, inference client, and evaluation benchmarks, enabling reproducible results for automated web tasks.
How It Works
MolmoWeb employs a multimodal large language model that interprets natural language tasks in conjunction with visual and structural information from web pages (screenshots and accessibility trees). It autonomously generates a sequence of browser actions—clicking elements, typing text, scrolling, and navigating URLs—to fulfill user-defined objectives. This approach allows for sophisticated, multi-step task execution directly within a web browser environment.
Quick Start & Requirements
uv for dependency management. Clone the repository, create a virtual environment with uv venv, and sync dependencies with uv sync. Playwright browsers must be installed via uv run playwright install --with-deps chromium.BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID for Browserbase; GOOGLE_API_KEY for Google Gemini; OPENAI_API_KEY for GPT-based judges.scripts/download_weights.sh to fetch models (e.g., allenai/MolmoWeb-8B) and scripts/start_server.sh to launch a local inference server.scripts/test_server.py to test the running model server.Highlighted Details
Maintenance & Community
No specific community channels (e.g., Discord, Slack) or details on notable contributors or sponsorships are mentioned in the provided text.
Licensing & Compatibility
The project is licensed under the Apache 2.0 license, which is permissive for commercial use and integration into closed-source projects.
Limitations & Caveats
The TODO list indicates that evaluation (Eval) and training (Training) functionalities are not yet fully implemented. Specific backend configurations require external API keys, which may present an adoption hurdle.
1 day ago
Inactive
browserbase