UI-TARS-desktop  by bytedance

GUI agent app for computer control via natural language

created 6 months ago
15,415 stars

Top 3.2% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a GUI agent application, UI-TARS Desktop, that enables users to control their computers using natural language. It leverages the UI-TARS vision-language model to interpret visual interfaces and execute commands, offering a novel way for users to automate tasks and interact with their operating system and web applications.

How It Works

UI-TARS Desktop utilizes a Vision-Language Model (VLM) to understand user instructions and the current state of the computer's graphical interface. It captures screenshots, performs visual recognition, and translates natural language commands into precise mouse and keyboard actions. This approach allows for direct, intuitive control of applications and system functions without requiring explicit scripting or coding.

Quick Start & Requirements

  • Installation: Refer to the Quick Start guide for installation instructions.
  • Prerequisites: Cross-platform support (Windows, MacOS, Browser). Specific model requirements may vary; check Hugging Face or ModelScope for details.
  • Resources: Local processing is emphasized for privacy and security.

Highlighted Details

  • Powered by the UI-TARS Vision-Language Model for natural language control.
  • Supports screenshot and visual recognition for precise interaction.
  • Offers cross-platform compatibility (Windows, MacOS, Browser).
  • Includes an experimental SDK for building custom GUI automation agents.

Maintenance & Community

  • Recent activity includes the release of v0.1.0 with a redesigned UI and support for the UI-TARS-1.5 model.
  • An SDK (@ui-tars/sdk) has been introduced.
  • Community engagement is available via Discord.

Licensing & Compatibility

  • Licensed under the Apache License 2.0.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is in its early stages, with a recent technical preview and v0.1.0 release. While cross-platform, specific model performance and compatibility may vary. The SDK is noted as experimental.

Health Check
Last commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
109
Issues (30d)
18
Star History
2,458 stars in the last 90 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

TagUI by aisingapore

0.1%
6k
Free RPA tool for automating repetitive tasks on websites, desktop apps, and command lines
created 8 years ago
updated 5 months ago
Feedback? Help us improve.