macOSpilot-ai-assistant by elfvingralf

macOS AI assistant answers questions about any application, in context and in audio

Created 2 years ago

1,156 stars

Top 33.3% on SourcePulse

View on GitHub

2 Experts Love This Project

Benjamin Bolte

Cofounder of K-Scale Labs

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

macOSpilot is a voice and vision-powered AI assistant designed to answer user questions about any application running on macOS, directly within the context of the active window. It targets macOS users who want to quickly get information or assistance without switching applications, offering a seamless, in-context, and audio-based interaction.

How It Works

The assistant leverages a NodeJS/Electron architecture. Upon activation via a keyboard shortcut, it captures a screenshot of the active window and records user voice input. This data is sent to OpenAI's Whisper API for transcription, then to the GPT Vision API along with the screenshot for analysis. The AI's response is displayed as an overlay on the active window and read aloud using OpenAI's TTS API. This approach allows for application-agnostic context awareness and natural language interaction.

Quick Start & Requirements

Install: Clone the repository and run yarn install or npm install.
Run: Execute yarn start or npm start in the terminal.
Prerequisites: NodeJS, OpenAI API key.
Permissions: Requires screen recording, microphone, and file access permissions.
Packaging: Can be packaged into an .app using npm run package-mac.
Documentation: Video walkthrough available on YouTube.

Highlighted Details

Application-agnostic context awareness via screenshots.
Voice and text input options.
In-context overlay and audio-based responses.
Configurable keyboard shortcuts and window settings.

Maintenance & Community

The project is maintained by a self-taught developer, @ralfelfving on Twitter/X, who also shares tutorials on YouTube. The project is open-source, with potential for community contributions.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The OpenAI API key is not stored encrypted. Conversation state is not persistent between application sessions. Screenshot and audio data are stored locally and overwritten, not automatically deleted. The developer notes that the code may not be "beautiful nor efficient."

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days