ProctorAI  by jam3scampbell

Multimodal AI app to discourage procrastination

Created 1 year ago
404 stars

Top 71.8% on SourcePulse

GitHubView on GitHub
Project Summary

ProctorAI is a multimodal AI designed to combat procrastination by monitoring user screen activity and intervening when unproductive behavior is detected. It targets individuals seeking a more intelligent and adaptable productivity tool than traditional site blockers, offering personalized interventions based on user-defined work sessions and rules.

How It Works

ProctorAI captures screenshots at user-defined intervals and processes them with multimodal LLMs (e.g., Claude-3.5-Sonnet, GPT-4o, LLaVA). Users specify their work goals and acceptable/unacceptable behaviors for each session, allowing for nuanced rule enforcement. If procrastination is detected, the AI can take control of the screen, issue personalized verbal warnings via text-to-speech, and enforce a cooldown period for the user to cease the distracting activity. A "two-tier" mode is recommended for cost efficiency, using a local model like LLaVA as a router to pre-screen images before sending them to a more powerful, expensive model.

Quick Start & Requirements

  • Install via git clone, create a virtual environment, and run pip install -r requirements.txt.
  • Execute the GUI with ./run.sh.
  • Requires MacOS (Windows version available in windows branch).
  • API keys for chosen LLMs (OpenAI, Anthropic, Gemini) and Eleven Labs (for TTS) must be set as environment variables.
  • For "two-tier" mode, Ollama and the LLaVA model are required.
  • Official documentation and setup guide: https://github.com/jam3scampbell/ProctorAI

Highlighted Details

  • Leverages multimodal LLMs for context-aware procrastination detection.
  • Supports personalized session specifications for flexible rule enforcement.
  • Features an "alive" design goal to create an intuitive sense of being monitored.
  • Offers optional text-to-speech for verbal interventions.

Maintenance & Community

The project is under active development with a roadmap including finetuning models, enhanced session scheduling, and improved user interaction. Community links are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is currently MacOS-specific, though a Windows branch exists. It relies on external LLM APIs, incurring potential costs. The "active development" status suggests potential for breaking changes or incomplete features.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jinze Bai Jinze Bai(Research Scientist at Alibaba Qwen), and
4 more.

self-operating-computer by OthersideAI

0.1%
10k
Framework for multimodal computer operation
Created 2 years ago
Updated 1 month ago
Feedback? Help us improve.