noScribe is a free, open-source desktop application for automated audio transcription, primarily targeting qualitative social researchers and journalists. It leverages OpenAI's Whisper and pyannote for transcription and speaker identification, offering a local, privacy-focused solution with an integrated editor for transcript refinement.
How It Works
noScribe utilizes Whisper for speech-to-text conversion and pyannote for speaker diarization. Users can select transcription quality (precise or fast) and configure options like language detection, pause marking, speaker identification, disfluency inclusion, and timestamp generation. The application processes audio locally, ensuring data privacy.
Quick Start & Requirements
- Installation: Downloadable executables are provided for Windows, macOS (Apple Silicon and Intel), and Linux.
- Prerequisites:
- Windows: NVIDIA CUDA toolkit (for GPU acceleration, requires 6GB+ VRAM).
- macOS: Rosetta 2 for Intel-based components on Apple Silicon.
- Resource Footprint: Download size is approximately 3.7 GB. Transcription of a one-hour interview can take up to three hours and requires significant CPU resources.
- Links:
Highlighted Details
- Runs entirely locally, no data sent to the internet.
- Supports ~60 languages, with best performance for English, Spanish, Italian, Portuguese, and German.
- Integrated editor allows synchronized playback of audio with transcript text for easy correction.
- Handles speaker identification and can mark overlapping speech (experimental).
Maintenance & Community
- Developed by Kai Dröge.
- Translations are community-contributed and may require review.
- Source code available on GitHub.
Licensing & Compatibility
- License: GPL-3.0.
- Compatibility: Free for commercial use and integration with closed-source software, subject to GPL-3.0 terms.
Limitations & Caveats
- Requires a powerful computer for reasonable processing times; slower machines may require overnight processing.
- Transcription quality is highly dependent on audio quality.
- Known issues include potential AI text repetition loops on long files and experimental support for multilingual audio and overlapping speech. Non-verbal expressions are not transcribed.