GenericAgent  by lsdefine

Autonomous PC agent for desktop automation

Created 1 month ago
661 stars

Top 50.8% on SourcePulse

GitHubView on GitHub
Project Summary

AI-powered PC agent loop for desktop automation and intelligent task execution.

This project provides a minimalist (~3,300 lines) AI-powered agent framework enabling LLMs to gain physical-level control over a PC's operating system, including browser, terminal, file system, and input devices. It targets technically savvy users seeking autonomous desktop automation, offering a self-growing skill tree without heavy dependencies like Electron or Docker.

How It Works

The core is a 92-line Sense-Think-Act loop (agent_loop.py) that utilizes 7 atomic tools (code execution, file I/O, web interaction, user prompts) to execute instructions. New tasks are learned, codified into Standard Operating Procedures (SOPs), and stored persistently. This "seed" philosophy allows the agent to autonomously discover, build, and remember new capabilities, effectively growing its own skill tree from a minimal codebase.

Quick Start & Requirements

  • Install: Clone the repository, then pip install streamlit pywebview.
  • Configure: Copy mykey_template.py to mykey.py and insert your LLM API key.
  • Launch: Run python launch.pyw for the GUI or python agentmain.py for CLI (tested on Android Termux).
  • Prerequisites: Python environment, LLM API key. Refer to WELCOME_NEW_USER.md for detailed bootstrap.

Highlighted Details

  • Extremely compact codebase (~3,300 lines) compared to alternatives.
  • Browser control injects JavaScript into the real browser via Tampermonkey, preserving session state.
  • Full OS control includes keyboard, mouse, screen vision, and ADB for mobile device interaction.
  • Achieves "dogfooding," with the agent building its own README and commit history.
  • Autonomous skill growth via learned SOPs, creating a personalized agent capability set.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmap were found in the provided text.

Licensing & Compatibility

  • License: MIT.
  • Compatibility: The MIT license permits commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The agent's functionality is dependent on an external LLM API key, introducing potential costs and third-party service reliance. The autonomous learning mechanism, while powerful, may require careful oversight to manage emergent behaviors or ensure task accuracy.

Health Check
Last Commit

20 hours ago

Responsiveness

Inactive

Pull Requests (30d)
8
Issues (30d)
16
Star History
645 stars in the last 30 days

Explore Similar Projects

Starred by Yiran Wu Yiran Wu(Coauthor of AutoGen), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

OS-Copilot by OS-Copilot

0.2%
2k
OS agent for automating daily tasks
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.