CapsWriter-Offline  by HaujetZhao

Offline voice input tool for PC, transcribing speech to text

created 2 years ago
3,799 stars

Top 13.1% on sourcepulse

GitHubView on GitHub
Project Summary

CapsWriter-Offline is a PC-based speech-to-text and transcription tool designed for offline use, offering unlimited duration, low latency, and high accuracy for Chinese and English input. It caters to users needing efficient voice typing, real-time transcription, and audio/video file subtitling, with features like customizable hotwords and automatic diary logging.

How It Works

The tool utilizes Sherpa-onnx with Alibaba's Paraformer model for speech recognition and a separate punctuation model. It operates in a client-server architecture. The server handles model loading and processing, while the client captures audio, manages hotkeys, and sends data to the server. For transcription, it leverages FFmpeg and generates SRT subtitles with word-level timestamps.

Quick Start & Requirements

  • Windows: Download pre-built executables. Requires Microsoft Visual C++ Redistributable. Server requires Windows 10+ (64-bit) and 4GB RAM for model loading.
  • Linux/macOS: Install dependencies via requirements-server.txt and requirements-client.txt. macOS requires sudo for core_client.py and may need brew install protobuf.
  • Models: Download separately and place in the models folder.
  • FFmpeg: Required for MP3 audio saving and video transcription.
  • Docs: Video Tutorial, GitHub Releases

Highlighted Details

  • Offline, low-latency, high-accuracy speech recognition.
  • Supports hotword customization for specific terms and rules.
  • Automatic diary logging of recognized text, with keyword-based categorization.
  • Drag-and-drop audio/video file transcription to SRT format.

Maintenance & Community

  • Active development with releases on GitHub.
  • Community support via GitHub Issues. Docker support available via Garonix/CapsWriter-Offline.

Licensing & Compatibility

  • The specific license is not explicitly stated in the README, but the project is hosted on GitHub, implying a permissive open-source license. Commercial use is likely permitted, but verification of the license file is recommended.

Limitations & Caveats

  • macOS users may encounter issues with the default Caps Lock hotkey and might need to switch to Right Shift.
  • Some dependencies may have compatibility issues with Python 3.11.
  • The project relies on specific versions of kaldi-native-fbank on Linux to avoid symbol errors.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
2
Star History
219 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.