mirrormate  by orangekame3

Voice-first AI assistant for mirrors

Created 1 week ago

New!

305 stars

Top 87.9% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a self-hosted, voice-first AI assistant designed to integrate into everyday objects, specifically mirrors, offering personalized interactions without requiring a separate screen or cloud dependency. It targets users interested in local AI, privacy-focused assistants, and novel physical interfaces, enabling quick access to information and conversational AI during routine daily checks.

How It Works

MirrorMate employs a voice-first architecture, running entirely locally using Ollama for LLM/VLM and embeddings, VOICEVOX for text-to-speech (TTS), and Whisper for speech-to-text (STT). It leverages Retrieval-Augmented Generation (RAG) for a personalized memory system, allowing the AI to recall and utilize information from past conversations. The system can be deployed minimally on a Raspberry Pi with an OpenAI API key or fully locally on a more powerful machine (e.g., Mac Studio with GPU) for complete offline operation, featuring a Next.js/React/Three.js frontend and Node.js/SQLite backend.

Quick Start & Requirements

  • Primary Install:
    • Easiest (OpenAI API): docker run -p 3000:3000 -e OPENAI_API_KEY=sk-xxx -e LLM_PROVIDER=openai -e TTS_PROVIDER=openai ghcr.io/orangekame3/mirrormate:latest
    • Fully Local (Ollama + VOICEVOX): git clone https://github.com/orangekame3/mirrormate.git, cd mirrormate, docker compose up -d (after ollama pull qwen2.5:14b).
  • Prerequisites: Docker, Ollama, VOICEVOX (for local setup). OpenAI API key (for easiest setup). A Raspberry Pi is suitable for minimal setups; a Mac Studio or other GPU machine is recommended for full local AI processing. Bun is used for development.
  • Links: Documentation, Getting Started, and Releases are available via the project's GitHub repository.

Highlighted Details

  • Voice-First Interaction: Activated via a wake word ("Hey Mira"), supporting multiple Speech-to-Text (STT) providers including Web Speech API, OpenAI Whisper, and local Whisper.
  • Personalized Memory: Utilizes RAG to store and retrieve conversational context, enabling personalized responses and memory recall.
  • Expressive Avatar: Features a lip-synced avatar with distinct animation states (Idle, Listening, Thinking, Speaking) to visually indicate the AI's status.
  • Multi-Provider Support: Offers flexibility in choosing components: LLM (OpenAI, Ollama), TTS (OpenAI, VOICEVOX), STT (Web Speech API, OpenAI Whisper, Local Whisper), and Embedding (Ollama, PLaMo-Embedding-1B).
  • Built-in Integrations: Includes support for Weather (Open-Meteo), Calendar (Google Calendar), Web search (Tavily), Reminders, and Discord sharing.
  • Plugin System: An extensible architecture allows for the addition of custom widgets and sensor integrations, such as the Vision Companion plugin for eye contact detection.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or roadmap were provided in the README text.

Licensing & Compatibility

The project is licensed under the MIT license. This permissive license generally allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The project is explicitly described as "Work in progress," with core features functional but noted to have "rough edges." Users should expect ongoing development and potential instability.

Health Check
Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
58
Issues (30d)
4
Star History
309 stars in the last 13 days

Explore Similar Projects

Feedback? Help us improve.