TranscriptionSuite  by homelab-00

Local and private speech-to-text application

Created 11 months ago
372 stars

Top 76.1% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

TranscriptionSuite is a fully local and private Speech-To-Text application designed for users prioritizing data privacy and offline functionality. It offers cross-platform support, advanced features like speaker diarization and an "Audio Notebook" mode, and integrates with LM Studio for AI chat capabilities. The application benefits users by providing a secure, self-hosted transcription solution with flexible model choices and remote access options.

How It Works

This project employs an Electron-based dashboard for the user interface, communicating with a Python backend. It supports multiple Speech-To-Text (STT) engines, including WhisperX, NVIDIA NeMo, and VibeVoice-ASR, with optional NVIDIA GPU acceleration or CPU-only processing. Core features like speaker diarization are integrated using libraries like PyAnnote or native VibeVoice capabilities. The architecture is Dockerized for streamlined deployment, enabling parallel processing for enhanced transcription speeds when hardware permits.

Quick Start & Requirements

  • Primary Install/Run: Docker or Podman is required. The dashboard application is downloaded separately from releases.
  • Prerequisites: Docker/Podman, NVIDIA Container Toolkit (for GPU acceleration on Linux/Windows), FUSE 2 (libfuse.so.2) for Linux AppImages, and a HuggingFace token for diarization model access.
  • Hardware: NVIDIA GPU recommended for acceleration; CPU mode is available. macOS runs in CPU mode only.
  • Links: Releases page for dashboard app download.

Highlighted Details

  • 100% Local & Private: All audio processing occurs on the user's machine; internet is only needed for initial model downloads.
  • Multi-Model Support: Integrates WhisperX (various sizes), NVIDIA NeMo Parakeet/Canary, and VibeVoice-ASR.
  • Advanced Features: Includes speaker diarization, multilingual support (90+ languages for Whisper), longform and live transcription, session file import, and system-wide keyboard shortcuts.
  • Remote Access: Supports secure remote connections via Tailscale (cross-network) or LAN (local network) using HTTPS and token authentication.
  • OpenAI-Compatible API: Exposes endpoints compatible with OpenAI's Audio API, allowing integration with other tools like Open-WebUI or LM Studio.
  • Performance: Claims transcription of 30 minutes of audio in under a minute with Whisper on an RTX 3060.

Maintenance & Community

The project is described as a personal hobby project developed by an engineer learning programming, with a commitment to fixing bugs and maintaining the application as long as it remains relevant. Contributions are welcomed, with a "Blackboard" mentioned for tracking issues and planned features.

Licensing & Compatibility

The project is licensed under the GNU General Public License v3.0 or later (GPLv3+). This is a strong copyleft license, meaning derivative works must also be open-sourced under GPLv3+, potentially restricting integration into closed-source commercial products without careful consideration.

Limitations & Caveats

macOS does not support GPU acceleration. Linux AppImages have a dependency on FUSE 2. Initial setup and model downloads can take a significant amount of time (10-20 minutes). The developer identifies as not being a professional software engineer, indicating a "vibecoded" approach, though core architectural decisions like Dockerization are deliberate. Experimental OS support is mentioned but not detailed.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
7
Issues (30d)
27
Star History
136 stars in the last 30 days

Explore Similar Projects

Starred by Victor Taelin Victor Taelin(Author of Bend, Kind, HVM) and Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research).

chat-with-gpt by cogentapps

0.0%
2k
Open-source ChatGPT app with voice
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.