Speech Note is a Linux and Sailfish OS desktop application designed for offline note-taking, reading, and translation using Speech-to-Text (STT), Text-to-Speech (TTS), and Machine Translation (MT) engines. It prioritizes user privacy by processing all data locally, making it suitable for users who require secure, private voice-to-text and translation capabilities without internet connectivity.
How It Works
The application leverages a modular architecture, supporting multiple STT (Coqui STT, Vosk, Whisper.cpp, Faster Whisper, April-ASR), TTS (espeak-ng, Piper, RHVoice, Coqui TTS, Mimic 3, WhisperSpeech), and MT (Bergamot Translator) engines. This allows users to select and download models for various languages, offering flexibility in choosing the best-performing or most suitable engine for their needs. All processing is performed locally, ensuring data privacy and offline functionality.
Quick Start & Requirements
- Installation: Primarily via Flatpak:
- Base:
flatpak install net.mkiol.SpeechNote
- NVIDIA Add-on:
flatpak install net.mkiol.SpeechNote.Addon.nvidia
- Arch Linux (AUR):
dsnote
or dsnote-git
- openSUSE:
zypper in speechnote
- Dependencies: Flatpak packages include heavy libraries like CUDA, ROCm, Torch, and Python. GPU acceleration add-ons are available for NVIDIA (recommended) and AMD (not recommended).
- Resources: Base Flatpak download is 0.9 GiB, unpacking to 3.2 GiB. NVIDIA add-on adds 3.7 GiB download / 6.4 GiB unpacked.
- Docs: https://github.com/mkiol/dsnote
Highlighted Details
- Supports over 60 languages for STT, TTS, and MT.
- Offers both "Base" (full features) and "Tiny" (basic features, smaller footprint) Flatpak packages.
- Extensive model browser for downloading STT, TTS, and MT models directly within the app.
- Custom model support via editing
models.json
.
Maintenance & Community
- Project hosted on GitHub and GitLab.
- Contributions welcome via PR/MR or issue reporting.
- Translations managed via Transifex.
- Support options include starring the repo, writing reviews, and donations via ko-fi or Liberapay.
Licensing & Compatibility
- Speech Note is licensed under the Mozilla Public License Version 2.0.
- Dependencies use a mix of MPL 2.0, Apache 2.0, MIT, BSD, LGPL, and GPL licenses. Notably, RHVoice and espeak-ng are GPL, and Mimic 3 is AGPL-3.0, which may have implications for linking in closed-source applications.
Limitations & Caveats
- Faster Whisper, Coqui TTS, and Mimic3 models are only available on x86-64 architecture.
- The AMD add-on is large, offers limited benefits, and may cause issues with ROCm 6.x.
- Some experimental models are marked as "likely doesn't work well."