gemini-cursor by 13point5

Desktop AI cursor using Gemini 2.0 Flash

Created 1 year ago

335 stars

Top 81.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

This project provides an AI-powered desktop cursor that interacts with the user's screen and voice, leveraging Google's Gemini 2.0 Flash model. It's designed for users needing assistance with complex visual information, website navigation, or real-time AI tutoring.

How It Works

The application utilizes Electron with React and TypeScript for its frontend, integrating with Google's Gemini API. It exploits Gemini's multimodal capabilities, including vision, speech, and function calling, to enable the AI cursor to perceive the screen, understand spoken commands, and respond verbally. The architecture prioritizes low latency for a real-time interactive experience.

Quick Start & Requirements

Install dependencies: npm install
Run the app: npm run start
Prerequisites: Node.js (v16+), npm, Gemini API key.
Setup involves cloning the repo, installing dependencies, running the app, and entering the API key.

Highlighted Details

AI cursor can see, hear, and speak.
Real-time interaction with low latency.
Supports use cases like understanding diagrams, navigating websites, and AI tutoring.

Maintenance & Community

The project is maintained by @13point5. Further community or roadmap information is not detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Users should verify compatibility for commercial or closed-source use.

Limitations & Caveats

The project is experimental and relies on the Gemini 2.0 Flash model, which may have limitations or undergo changes. Gemini API key is required for functionality.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days