Discover and explore top open-source AI tools and projects—updated daily.
r-muresanAI-powered interactive screen guidance system
Top 89.7% on SourcePulse
Screen.vision provides an AI-powered assistant that guides users through complex on-screen tasks by analyzing their screen in real-time and delivering step-by-step instructions. It targets users needing help with software configuration, setup, or any digital workflow, offering a more intuitive alternative to static tutorials.
How It Works
The system captures the user's screen via browser APIs and employs a multi-model AI approach. A primary LLM (GPT-5.2) generates sequential instructions, while a vision model (Qwen3-VL) detects specific UI elements for precise guidance. Another model (Gemini 3 Flash) verifies task completion by comparing screen states before and after an action. This pipeline ensures users receive concise, actionable steps without information overload, with processing occurring in real-time.
Quick Start & Requirements
pnpm install), and backend dependencies (pip install -r requirements.txt).OPENAI_API_KEY) and OpenRouter (OPENROUTER_API_KEY) configured in a .env.local file.npm run dev to start the Next.js frontend (port 3000) and FastAPI backend (port 8000).Highlighted Details
Maintenance & Community
No specific details regarding maintainers, community channels (like Discord/Slack), or project roadmap are provided in the README.
Licensing & Compatibility
The project's license is not specified in the README. This omission prevents an immediate assessment of compatibility for commercial use or integration into closed-source projects.
Limitations & Caveats
Self-hosting requires obtaining and managing API keys for external AI services (OpenAI, OpenRouter), which may incur costs. The lack of a specified license is a significant blocker for determining usage rights. The system's effectiveness is dependent on the AI models' ability to accurately interpret diverse UI elements and user actions.
5 days ago
Inactive
askui
DevAgentForge
landing-ai