dsnote  by mkiol

Linux app for offline note-taking, reading, and translation

created 3 years ago
1,016 stars

Top 37.5% on sourcepulse

GitHubView on GitHub
Project Summary

Speech Note is a Linux and Sailfish OS desktop application designed for offline note-taking, reading, and translation using Speech-to-Text (STT), Text-to-Speech (TTS), and Machine Translation (MT) engines. It prioritizes user privacy by processing all data locally, making it suitable for users who require secure, private voice-to-text and translation capabilities without internet connectivity.

How It Works

The application leverages a modular architecture, supporting multiple STT (Coqui STT, Vosk, Whisper.cpp, Faster Whisper, April-ASR), TTS (espeak-ng, Piper, RHVoice, Coqui TTS, Mimic 3, WhisperSpeech), and MT (Bergamot Translator) engines. This allows users to select and download models for various languages, offering flexibility in choosing the best-performing or most suitable engine for their needs. All processing is performed locally, ensuring data privacy and offline functionality.

Quick Start & Requirements

  • Installation: Primarily via Flatpak:
    • Base: flatpak install net.mkiol.SpeechNote
    • NVIDIA Add-on: flatpak install net.mkiol.SpeechNote.Addon.nvidia
    • Arch Linux (AUR): dsnote or dsnote-git
    • openSUSE: zypper in speechnote
  • Dependencies: Flatpak packages include heavy libraries like CUDA, ROCm, Torch, and Python. GPU acceleration add-ons are available for NVIDIA (recommended) and AMD (not recommended).
  • Resources: Base Flatpak download is 0.9 GiB, unpacking to 3.2 GiB. NVIDIA add-on adds 3.7 GiB download / 6.4 GiB unpacked.
  • Docs: https://github.com/mkiol/dsnote

Highlighted Details

  • Supports over 60 languages for STT, TTS, and MT.
  • Offers both "Base" (full features) and "Tiny" (basic features, smaller footprint) Flatpak packages.
  • Extensive model browser for downloading STT, TTS, and MT models directly within the app.
  • Custom model support via editing models.json.

Maintenance & Community

  • Project hosted on GitHub and GitLab.
  • Contributions welcome via PR/MR or issue reporting.
  • Translations managed via Transifex.
  • Support options include starring the repo, writing reviews, and donations via ko-fi or Liberapay.

Licensing & Compatibility

  • Speech Note is licensed under the Mozilla Public License Version 2.0.
  • Dependencies use a mix of MPL 2.0, Apache 2.0, MIT, BSD, LGPL, and GPL licenses. Notably, RHVoice and espeak-ng are GPL, and Mimic 3 is AGPL-3.0, which may have implications for linking in closed-source applications.

Limitations & Caveats

  • Faster Whisper, Coqui TTS, and Mimic3 models are only available on x86-64 architecture.
  • The AMD add-on is large, offers limited benefits, and may cause issues with ROCm 6.x.
  • Some experimental models are marked as "likely doesn't work well."
Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
17
Star History
154 stars in the last 90 days

Explore Similar Projects

Starred by Addy Osmani Addy Osmani(Engineering Leader on Google Chrome), Victor Taelin Victor Taelin(Author of Bend, Kind, HVM), and
1 more.

chatbox by chatboxai

0.3%
36k
Desktop client app for AI models/LLMs
created 2 years ago
updated 6 days ago
Feedback? Help us improve.