Discover and explore top open-source AI tools and projects—updated daily.
jmerelnycAutonomous, vision-grounded LLM agents for computer operation
New!
Top 36.6% on SourcePulse
Summary
Photo Agents provides a runtime for autonomous, self-evolving LLM agents designed to operate computer systems by grounding their reasoning in visual screen content. Targeting developers and power users, it enables agents to perceive, reason, and act directly on the UI, offering local execution for data privacy and self-written skills for adaptive functionality.
How It Works
The system employs a streaming agent loop built around a perceive → reason → act cycle. It prioritizes vision-grounded memory, storing observations in biological-inspired layers rather than text transcripts. Skills are generated autonomously by the agent itself based on successful task execution, fostering self-evolution and enabling effective UI interaction through visual context.
Quick Start & Requirements
Installation: pip install photoagents or pip install "photoagents[all]". Requires Python 3.10+. A Photo Agents API key, validated via https://photo-agents.com, is mandatory. LLM provider credentials (OpenAI, Anthropic) must be configured (e.g., credentials.py). Run interactively via python -m photoagents, or launch GUI clients like Streamlit (pythonw -m photoagents.cli.launcher) or PyQt (python -m photoagents.clients.desktop_app).
Highlighted Details
Maintenance & Community
Project website: https://photo-agents.com. Active X/Twitter presence (https://x.com/photoagents) for updates and demos. Specific details on core maintainers, sponsorships, or dedicated community channels (Discord/Slack) are not provided in the README.
Licensing & Compatibility
Released under the MIT license, permitting broad usage, including commercial applications and linking within closed-source projects.
Limitations & Caveats
The software is in beta, with APIs subject to change before 1.0. A remote-validated API key is a prerequisite for runtime operation, serving as an accountability gate.
5 days ago
Inactive
gptme