GenericAgent by lsdefine

Autonomous PC agent for desktop automation

Created 4 months ago

12,765 stars

Top 4.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Li Jiang

Coauthor of AutoGen; Engineer at Microsoft

Project Summary

AI-powered PC agent loop for desktop automation and intelligent task execution.

This project provides a minimalist (~3,300 lines) AI-powered agent framework enabling LLMs to gain physical-level control over a PC's operating system, including browser, terminal, file system, and input devices. It targets technically savvy users seeking autonomous desktop automation, offering a self-growing skill tree without heavy dependencies like Electron or Docker.

How It Works

The core is a 92-line Sense-Think-Act loop (agent_loop.py) that utilizes 7 atomic tools (code execution, file I/O, web interaction, user prompts) to execute instructions. New tasks are learned, codified into Standard Operating Procedures (SOPs), and stored persistently. This "seed" philosophy allows the agent to autonomously discover, build, and remember new capabilities, effectively growing its own skill tree from a minimal codebase.

Quick Start & Requirements

Install: Clone the repository, then pip install streamlit pywebview.
Configure: Copy mykey_template.py to mykey.py and insert your LLM API key.
Launch: Run python launch.pyw for the GUI or python agentmain.py for CLI (tested on Android Termux).
Prerequisites: Python environment, LLM API key. Refer to WELCOME_NEW_USER.md for detailed bootstrap.

Highlighted Details

Extremely compact codebase (~3,300 lines) compared to alternatives.
Browser control injects JavaScript into the real browser via Tampermonkey, preserving session state.
Full OS control includes keyboard, mouse, screen vision, and ADB for mobile device interaction.
Achieves "dogfooding," with the agent building its own README and commit history.
Autonomous skill growth via learned SOPs, creating a personalized agent capability set.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmap were found in the provided text.

Licensing & Compatibility

License: MIT.
Compatibility: The MIT license permits commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The agent's functionality is dependent on an external LLM API key, introducing potential costs and third-party service reliance. The autonomous learning mechanism, while powerful, may require careful oversight to manage emergent behaviors or ensure task accuracy.

Health Check

Last Commit

18 hours ago

Responsiveness

Inactive

Pull Requests (30d)

184

Issues (30d)

Star History

1,652 stars in the last 30 days