TranscriptionSuite by homelab-00

Local and private speech-to-text application

Created 1 year ago

541 stars

Top 58.0% on SourcePulse

Project Summary

Summary

TranscriptionSuite is a fully local and private Speech-To-Text application designed for users prioritizing data privacy and offline functionality. It offers cross-platform support, advanced features like speaker diarization and an "Audio Notebook" mode, and integrates with LM Studio for AI chat capabilities. The application benefits users by providing a secure, self-hosted transcription solution with flexible model choices and remote access options.

How It Works

This project employs an Electron-based dashboard for the user interface, communicating with a Python backend. It supports multiple Speech-To-Text (STT) engines, including WhisperX, NVIDIA NeMo, and VibeVoice-ASR, with optional NVIDIA GPU acceleration or CPU-only processing. Core features like speaker diarization are integrated using libraries like PyAnnote or native VibeVoice capabilities. The architecture is Dockerized for streamlined deployment, enabling parallel processing for enhanced transcription speeds when hardware permits.

Quick Start & Requirements

Primary Install/Run: Docker or Podman is required. The dashboard application is downloaded separately from releases.
Prerequisites: Docker/Podman, NVIDIA Container Toolkit (for GPU acceleration on Linux/Windows), FUSE 2 (libfuse.so.2) for Linux AppImages, and a HuggingFace token for diarization model access.
Hardware: NVIDIA GPU recommended for acceleration; CPU mode is available. macOS runs in CPU mode only.
Links: Releases page for dashboard app download.

Highlighted Details

100% Local & Private: All audio processing occurs on the user's machine; internet is only needed for initial model downloads.
Multi-Model Support: Integrates WhisperX (various sizes), NVIDIA NeMo Parakeet/Canary, and VibeVoice-ASR.
Advanced Features: Includes speaker diarization, multilingual support (90+ languages for Whisper), longform and live transcription, session file import, and system-wide keyboard shortcuts.
Remote Access: Supports secure remote connections via Tailscale (cross-network) or LAN (local network) using HTTPS and token authentication.
OpenAI-Compatible API: Exposes endpoints compatible with OpenAI's Audio API, allowing integration with other tools like Open-WebUI or LM Studio.
Performance: Claims transcription of 30 minutes of audio in under a minute with Whisper on an RTX 3060.

Maintenance & Community

The project is described as a personal hobby project developed by an engineer learning programming, with a commitment to fixing bugs and maintaining the application as long as it remains relevant. Contributions are welcomed, with a "Blackboard" mentioned for tracking issues and planned features.

Licensing & Compatibility

The project is licensed under the GNU General Public License v3.0 or later (GPLv3+). This is a strong copyleft license, meaning derivative works must also be open-sourced under GPLv3+, potentially restricting integration into closed-source commercial products without careful consideration.

Limitations & Caveats

macOS does not support GPU acceleration. Linux AppImages have a dependency on FUSE 2. Initial setup and model downloads can take a significant amount of time (10-20 minutes). The developer identifies as not being a professional software engineer, indicating a "vibecoded" approach, though core architectural decisions like Dockerization are deliberate. Experimental OS support is mentioned but not detailed.

TranscriptionSuite by homelab-00

Explore Similar Projects

whispering by braden-w

S.A.T.U.R.D.A.Y by GRVYDEV

RuntimeSpeechRecognizer by gtreshchev

transcribe by vivekuppal

unity-AI-Chat-Toolkit by zhangliwei7758

ChatdollKit by uezo

sherpa-ncnn by k2-fsa

Scriberr by rishikanthc

unmute by kyutai-labs

RealtimeVoiceChat by KoljaB

RTranslator by niedev

WhisperLiveKit by QuentinFuxa