fara  by microsoft

Agentic model for visual computer task automation

Created 1 month ago
1,440 stars

Top 28.2% on SourcePulse

GitHubView on GitHub
Project Summary

Fara-7B is Microsoft's compact, 7-billion-parameter agentic Small Language Model (SLM) engineered for computer use. It automates complex web-based tasks by directly interacting with computer interfaces, offering an efficient and privacy-preserving alternative to larger, more resource-intensive agent systems. Its primary benefit lies in achieving state-of-the-art performance within its size class, enabling on-device deployment and significantly reducing task completion steps.

How It Works

Fara-7B operates visually, perceiving webpages and executing actions like scrolling, typing, and clicking at predicted coordinates, mimicking human interaction without relying on accessibility trees. This approach is enabled by its foundation on Qwen2.5-VL-7B and training via a novel synthetic data pipeline using the Magentic-One multi-agent framework. This methodology allows for efficient task completion, averaging approximately 16 steps per task, compared to over 40 steps for comparable models.

Quick Start & Requirements

Installation involves cloning the repository, setting up a Python virtual environment, and installing dependencies via pip install -e . and playwright install. For hosting, Azure Foundry is recommended for a serverless experience without local GPU requirements. Alternatively, self-hosting is possible using vLLM (vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto) on machines with sufficient GPU VRAM. Running tasks is done via fara-cli --task "...". Links: Model, Dataset, Azure Foundry.

Highlighted Details

  • Achieves state-of-the-art results for its 7B parameter size across multiple web agent benchmarks, outperforming larger systems.
  • Demonstrates high efficiency, averaging ~16 steps per task compared to ~41 for comparable models.
  • Introduces WebTailBench, a new benchmark evaluating 11 real-world web task types, where Fara-7B shows leading performance among computer-use models.
  • Leverages visual perception and direct coordinate-based actions for intuitive computer interaction.

Maintenance & Community

This is an experimental release aimed at community exploration and feedback. While specific community channels (like Discord/Slack) are not detailed, the project collaborates with BrowserBase for task annotation.

Licensing & Compatibility

The specific license is not detailed in the provided README. Users should verify compatibility for commercial use or integration into closed-source projects.

Limitations & Caveats

Fara-7B is an experimental release, and users are advised to run it in sandboxed environments, monitor its execution, and avoid sensitive data or high-risk domains. Reproducing results on live websites presents inherent challenges due to dynamic web content, although the project implements measures like BrowserBase integration and task updates to mitigate this.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
20
Issues (30d)
17
Star History
1,739 stars in the last 30 days

Explore Similar Projects

Starred by Edward Z. Yang Edward Z. Yang(Research Engineer at Meta; Maintainer of PyTorch), Anton Osika Anton Osika(Cofounder of Lovable), and
3 more.

gptme by gptme

0.1%
4k
CLI tool for terminal agent workflows
Created 2 years ago
Updated 1 day ago
Feedback? Help us improve.