dsnote by mkiol

Linux app for offline note-taking, reading, and translation

Created 4 years ago

1,367 stars

Top 29.1% on SourcePulse

Project Summary

Speech Note is a Linux and Sailfish OS desktop application designed for offline note-taking, reading, and translation using Speech-to-Text (STT), Text-to-Speech (TTS), and Machine Translation (MT) engines. It prioritizes user privacy by processing all data locally, making it suitable for users who require secure, private voice-to-text and translation capabilities without internet connectivity.

How It Works

The application leverages a modular architecture, supporting multiple STT (Coqui STT, Vosk, Whisper.cpp, Faster Whisper, April-ASR), TTS (espeak-ng, Piper, RHVoice, Coqui TTS, Mimic 3, WhisperSpeech), and MT (Bergamot Translator) engines. This allows users to select and download models for various languages, offering flexibility in choosing the best-performing or most suitable engine for their needs. All processing is performed locally, ensuring data privacy and offline functionality.

Quick Start & Requirements

Installation: Primarily via Flatpak:
- Base: flatpak install net.mkiol.SpeechNote
- NVIDIA Add-on: flatpak install net.mkiol.SpeechNote.Addon.nvidia
- Arch Linux (AUR): dsnote or dsnote-git
- openSUSE: zypper in speechnote
Dependencies: Flatpak packages include heavy libraries like CUDA, ROCm, Torch, and Python. GPU acceleration add-ons are available for NVIDIA (recommended) and AMD (not recommended).
Resources: Base Flatpak download is 0.9 GiB, unpacking to 3.2 GiB. NVIDIA add-on adds 3.7 GiB download / 6.4 GiB unpacked.
Docs: https://github.com/mkiol/dsnote

Highlighted Details

Supports over 60 languages for STT, TTS, and MT.
Offers both "Base" (full features) and "Tiny" (basic features, smaller footprint) Flatpak packages.
Extensive model browser for downloading STT, TTS, and MT models directly within the app.
Custom model support via editing models.json.

Maintenance & Community

Project hosted on GitHub and GitLab.
Contributions welcome via PR/MR or issue reporting.
Translations managed via Transifex.
Support options include starring the repo, writing reviews, and donations via ko-fi or Liberapay.

Licensing & Compatibility

Speech Note is licensed under the Mozilla Public License Version 2.0.
Dependencies use a mix of MPL 2.0, Apache 2.0, MIT, BSD, LGPL, and GPL licenses. Notably, RHVoice and espeak-ng are GPL, and Mimic 3 is AGPL-3.0, which may have implications for linking in closed-source applications.

Limitations & Caveats

Faster Whisper, Coqui TTS, and Mimic3 models are only available on x86-64 architecture.
The AMD add-on is large, offers limited benefits, and may cause issues with ROCm 6.x.
Some experimental models are marked as "likely doesn't work well."

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

2

Issues (30d)

10

Star History

47 stars in the last 30 days

Explore Similar Projects

gt.el by lorniu

Emacs package for text translation, offering high configurability

Created 5 years ago

Updated 7 months ago

MarkFlowy by drl990114

Cross-platform markdown editor application

Created 3 years ago

Updated 3 days ago

generate-subtitles by mayeaux

Web app for audio/video transcription and translation

Created 3 years ago

Updated 2 years ago

vits-simple-api by Artrajz

HTTP API for VITS-based text-to-speech and voice conversion

Created 3 years ago

Updated 4 months ago

TTime by InkTimeRecord

Screenshot, OCR, and translation software

Created 3 years ago

Updated 1 year ago

whisper-asr-webservice by ahmetoner

ASR webservice API for speech recognition, translation, and language ID

Created 3 years ago

Updated 3 months ago

Starred by

Abubakar Abid

Abubakar Abid(Cofounder of Gradio).

voice-pro by abus-aikorea

WebUI for speech recognition, translation, and dubbing

Created 1 year ago

Updated 2 months ago

TranslationPlugin by YiiGuxing

Translation plugin for IntelliJ-based IDEs/Android Studio

Created 9 years ago

Updated 1 day ago

pot-desktop by pot-app

Cross-platform app for text translation and recognition

Created 3 years ago

Updated 1 month ago

LunaTranslator by HIllya51

Galgame translator for visual novels

Created 3 years ago

Updated 1 day ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI),

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp), and

2 more.

buzz by chidiwilliams

Desktop app for offline audio transcription and translation

Created 3 years ago

Updated 3 days ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic) and

Chaoyu Yang

Chaoyu Yang(Founder of Bento).

nextai-translator by nextai-translator

Cross-platform translator app leveraging OpenAI API

Created 3 years ago

Updated 3 days ago

Feedback? Help us improve.