macOSpilot-ai-assistant  by elfvingralf

macOS AI assistant answers questions about any application, in context and in audio

created 1 year ago
1,153 stars

Top 34.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

macOSpilot is a voice and vision-powered AI assistant designed to answer user questions about any application running on macOS, directly within the context of the active window. It targets macOS users who want to quickly get information or assistance without switching applications, offering a seamless, in-context, and audio-based interaction.

How It Works

The assistant leverages a NodeJS/Electron architecture. Upon activation via a keyboard shortcut, it captures a screenshot of the active window and records user voice input. This data is sent to OpenAI's Whisper API for transcription, then to the GPT Vision API along with the screenshot for analysis. The AI's response is displayed as an overlay on the active window and read aloud using OpenAI's TTS API. This approach allows for application-agnostic context awareness and natural language interaction.

Quick Start & Requirements

  • Install: Clone the repository and run yarn install or npm install.
  • Run: Execute yarn start or npm start in the terminal.
  • Prerequisites: NodeJS, OpenAI API key.
  • Permissions: Requires screen recording, microphone, and file access permissions.
  • Packaging: Can be packaged into an .app using npm run package-mac.
  • Documentation: Video walkthrough available on YouTube.

Highlighted Details

  • Application-agnostic context awareness via screenshots.
  • Voice and text input options.
  • In-context overlay and audio-based responses.
  • Configurable keyboard shortcuts and window settings.

Maintenance & Community

The project is maintained by a self-taught developer, @ralfelfving on Twitter/X, who also shares tutorials on YouTube. The project is open-source, with potential for community contributions.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The OpenAI API key is not stored encrypted. Conversation state is not persistent between application sessions. Screenshot and audio data are stored locally and overwritten, not automatically deleted. The developer notes that the code may not be "beautiful nor efficient."

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.