fara by microsoft

Agentic model for visual computer task automation

Created 4 months ago

4,492 stars

Top 10.9% on SourcePulse

Project Summary

Fara-7B is Microsoft's compact, 7-billion-parameter agentic Small Language Model (SLM) engineered for computer use. It automates complex web-based tasks by directly interacting with computer interfaces, offering an efficient and privacy-preserving alternative to larger, more resource-intensive agent systems. Its primary benefit lies in achieving state-of-the-art performance within its size class, enabling on-device deployment and significantly reducing task completion steps.

How It Works

Fara-7B operates visually, perceiving webpages and executing actions like scrolling, typing, and clicking at predicted coordinates, mimicking human interaction without relying on accessibility trees. This approach is enabled by its foundation on Qwen2.5-VL-7B and training via a novel synthetic data pipeline using the Magentic-One multi-agent framework. This methodology allows for efficient task completion, averaging approximately 16 steps per task, compared to over 40 steps for comparable models.

Quick Start & Requirements

Installation involves cloning the repository, setting up a Python virtual environment, and installing dependencies via pip install -e . and playwright install. For hosting, Azure Foundry is recommended for a serverless experience without local GPU requirements. Alternatively, self-hosting is possible using vLLM (vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto) on machines with sufficient GPU VRAM. Running tasks is done via fara-cli --task "...". Links: Model, Dataset, Azure Foundry.

Highlighted Details

Achieves state-of-the-art results for its 7B parameter size across multiple web agent benchmarks, outperforming larger systems.
Demonstrates high efficiency, averaging ~16 steps per task compared to ~41 for comparable models.
Introduces WebTailBench, a new benchmark evaluating 11 real-world web task types, where Fara-7B shows leading performance among computer-use models.
Leverages visual perception and direct coordinate-based actions for intuitive computer interaction.

Maintenance & Community

This is an experimental release aimed at community exploration and feedback. While specific community channels (like Discord/Slack) are not detailed, the project collaborates with BrowserBase for task annotation.

Licensing & Compatibility

The specific license is not detailed in the provided README. Users should verify compatibility for commercial use or integration into closed-source projects.

Limitations & Caveats

Fara-7B is an experimental release, and users are advised to run it in sandboxed environments, monitor its execution, and avoid sensitive data or high-risk domains. Reproducing results on live websites presents inherent challenges due to dynamic web content, although the project implements measures like BrowserBase integration and task updates to mitigate this.

fara by microsoft

Explore Similar Projects

agent by trymeka

on-device-browser-agent by RunanywhereAI

agentica by shibing624

agentchain by jina-ai

fuji-web by normal-computing

youtu-tip by TencentCloudADP

CogAgent by zai-org

index by lmnr-ai

ShowUI by showlab

gptme by gptme

MiroThinker by MiroMindAI

ANUS by anus-dev