tambourine-voice  by kstonekuan

Universal voice interface for seamless app dictation

Created 2 months ago
282 stars

Top 92.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Tambourine offers a customizable, open-source voice interface for any application, acting as a privacy-focused alternative to proprietary dictation tools. It enables users to dictate text naturally at their cursor, significantly faster than typing, with AI-powered formatting.

How It Works

A Tauri desktop app (Rust/React) captures audio via hotkeys and communicates with a Python FastAPI backend. The backend streams audio using WebRTC to various Speech-to-Text (STT) and Large Language Model (LLM) providers (cloud or local like Whisper/Ollama) for transcription and intelligent text cleaning (punctuation, filler removal, custom terms). Processed text is returned to the app for direct cursor input. This modular design prioritizes user control over AI services and formatting rules.

Quick Start & Requirements

  • Primary install / run command: Build Tauri app (cd app && pnpm install && pnpm dev) and run Python server (cd server && uv sync && uv run python main.py). Docker deployment is available for the server.
  • Non-default prerequisites and dependencies: Rust, Node.js, pnpm, Python 3.13+, uv (Python package manager). Linux requires specific development libraries (e.g., libwebkit2gtk-4.1-dev, build-essential). Microphone access and macOS Accessibility permissions are mandatory. API keys for chosen STT and LLM providers (e.g., Cartesia, Deepgram, OpenAI, Groq, Gemini) are required. Local STT/LLM requires Ollama and Whisper setup.
  • Links: CONTRIBUTING.md for development setup.

Highlighted Details

  • Universal Dictation: Speak directly into any application at the cursor position via configurable hotkeys.
  • Extensive Provider Support: Integrates with numerous cloud STT/LLM services (AssemblyAI, AWS, Google, Groq, OpenAI, etc.) and supports fully local execution via Ollama/Whisper.
  • AI Text Formatting: Cleans dictation, adds punctuation, removes filler words, and respects personal dictionaries for custom terminology.
  • Dual Recording Modes: Offers both hold-to-record (Ctrl+Alt+) and toggle recording (Ctrl+Alt+Space) modes.

Maintenance & Community

  • Status: Actively developed; core features are functional, but expect breaking changes to code, architecture, and configuration.
  • Community: A Discord server is available for help and discussions. Contribution guidelines are detailed in CONTRIBUTING.md.

Licensing & Compatibility

  • License type: AGPL-3.0. This strong copyleft license requires derivative works distributed to be open-sourced under the same terms, potentially impacting integration into closed-source commercial products without open-sourcing the entire product.
  • Compatibility notes: AGPL-3.0 permits commercial use but imposes significant obligations regarding source code availability for distributed modifications.

Limitations & Caveats

  • Development Stage: Under active development; subject to breaking changes.
  • Platform Support: Fully supports Windows and macOS; Linux support is partial (⚠️). Mobile platforms (Android/iOS) are unsupported.
  • Setup: Requires managing multiple language environments (Rust, Node.js, Python) and obtaining API keys, which may present a barrier for less technical users.
Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
69
Issues (30d)
47
Star History
72 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.