tambourine-voice by kstonekuan

Universal voice interface for seamless app dictation

Created 2 months ago

282 stars

Top 92.8% on SourcePulse

1 Expert Loves This Project

transitive-bullshit

Founder of Agentic

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Tambourine offers a customizable, open-source voice interface for any application, acting as a privacy-focused alternative to proprietary dictation tools. It enables users to dictate text naturally at their cursor, significantly faster than typing, with AI-powered formatting.

How It Works

A Tauri desktop app (Rust/React) captures audio via hotkeys and communicates with a Python FastAPI backend. The backend streams audio using WebRTC to various Speech-to-Text (STT) and Large Language Model (LLM) providers (cloud or local like Whisper/Ollama) for transcription and intelligent text cleaning (punctuation, filler removal, custom terms). Processed text is returned to the app for direct cursor input. This modular design prioritizes user control over AI services and formatting rules.

Quick Start & Requirements

Primary install / run command: Build Tauri app (cd app && pnpm install && pnpm dev) and run Python server (cd server && uv sync && uv run python main.py). Docker deployment is available for the server.
Non-default prerequisites and dependencies: Rust, Node.js, pnpm, Python 3.13+, uv (Python package manager). Linux requires specific development libraries (e.g., libwebkit2gtk-4.1-dev, build-essential). Microphone access and macOS Accessibility permissions are mandatory. API keys for chosen STT and LLM providers (e.g., Cartesia, Deepgram, OpenAI, Groq, Gemini) are required. Local STT/LLM requires Ollama and Whisper setup.
Links: CONTRIBUTING.md for development setup.

Highlighted Details

Universal Dictation: Speak directly into any application at the cursor position via configurable hotkeys.
Extensive Provider Support: Integrates with numerous cloud STT/LLM services (AssemblyAI, AWS, Google, Groq, OpenAI, etc.) and supports fully local execution via Ollama/Whisper.
AI Text Formatting: Cleans dictation, adds punctuation, removes filler words, and respects personal dictionaries for custom terminology.
Dual Recording Modes: Offers both hold-to-record (Ctrl+Alt+) and toggle recording (Ctrl+Alt+Space) modes.

Maintenance & Community

Status: Actively developed; core features are functional, but expect breaking changes to code, architecture, and configuration.
Community: A Discord server is available for help and discussions. Contribution guidelines are detailed in CONTRIBUTING.md.

Licensing & Compatibility

License type: AGPL-3.0. This strong copyleft license requires derivative works distributed to be open-sourced under the same terms, potentially impacting integration into closed-source commercial products without open-sourcing the entire product.
Compatibility notes: AGPL-3.0 permits commercial use but imposes significant obligations regarding source code availability for distributed modifications.

Limitations & Caveats

Development Stage: Under active development; subject to breaking changes.
Platform Support: Fully supports Windows and macOS; Linux support is partial (⚠️). Mobile platforms (Android/iOS) are unsupported.
Setup: Requires managing multiple language environments (Rust, Node.js, Python) and obtaining API keys, which may present a barrier for less technical users.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

69

Issues (30d)

47

Star History

72 stars in the last 30 days

Explore Similar Projects

WhisperChain by chrischoy

Voice-to-text tool with AI cleanup for faster workflows

Created 1 year ago

Updated 1 year ago

VoiceFlow by infiniV

Private, local speech-to-text for Windows

Created 2 months ago

Updated 1 month ago

BiBi-Keyboard by BryceWG

Android keyboard app leveraging LLM and ASR for advanced voice input

Created 4 months ago

Updated 3 days ago

OpenVoiceChat by Finity-Alpha

Natural voice conversations with LLMs

Created 2 years ago

Updated 1 month ago

LiveWhisper by Nikorasu

Live transcription tool using OpenAI's Whisper

Created 3 years ago

Updated 7 months ago

Starred by

Bryan Helmig

Bryan Helmig(Cofounder of Zapier).

hyprwhspr by goodroot

Native speech-to-text for system-wide dictation

Created 5 months ago

Updated 1 week ago

Starred by

Dan Guido

Dan Guido(Cofounder of Trail of Bits) and

Michael Han

Michael Han(Cofounder of Unsloth).

FluidVoice by altic-dev

macOS app for local voice-to-text transcription with AI enhancement

Created 5 months ago

Updated 2 days ago

voquill by josiahsrc

Voice-powered productivity workspace

Created 5 months ago

Updated 1 day ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic).

ollama-voice-mac by apeatling

Offline voice assistant for macOS

Created 2 years ago

Updated 6 months ago

amical by amicalhq

Local-first AI dictation and note-taking app

Created 9 months ago

Updated 2 days ago

Starred by

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

transcriber_app by davabase

Real-time speech-to-text transcription app

Created 3 years ago

Updated 3 years ago

Starred by

Emile Vauge

Emile Vauge(Founder of Traefik).

Scriberr by rishikanthc

Self-hosted app for local AI audio transcription

Created 1 year ago

Updated 2 weeks ago

Feedback? Help us improve.