noScribe  by kaixxx

GUI tool for local AI-powered audio transcription

created 2 years ago
935 stars

Top 40.0% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

noScribe is a free, open-source desktop application for automated audio transcription, primarily targeting qualitative social researchers and journalists. It leverages OpenAI's Whisper and pyannote for transcription and speaker identification, offering a local, privacy-focused solution with an integrated editor for transcript refinement.

How It Works

noScribe utilizes Whisper for speech-to-text conversion and pyannote for speaker diarization. Users can select transcription quality (precise or fast) and configure options like language detection, pause marking, speaker identification, disfluency inclusion, and timestamp generation. The application processes audio locally, ensuring data privacy.

Quick Start & Requirements

  • Installation: Downloadable executables are provided for Windows, macOS (Apple Silicon and Intel), and Linux.
  • Prerequisites:
    • Windows: NVIDIA CUDA toolkit (for GPU acceleration, requires 6GB+ VRAM).
    • macOS: Rosetta 2 for Intel-based components on Apple Silicon.
  • Resource Footprint: Download size is approximately 3.7 GB. Transcription of a one-hour interview can take up to three hours and requires significant CPU resources.
  • Links:

Highlighted Details

  • Runs entirely locally, no data sent to the internet.
  • Supports ~60 languages, with best performance for English, Spanish, Italian, Portuguese, and German.
  • Integrated editor allows synchronized playback of audio with transcript text for easy correction.
  • Handles speaker identification and can mark overlapping speech (experimental).

Maintenance & Community

  • Developed by Kai Dröge.
  • Translations are community-contributed and may require review.
  • Source code available on GitHub.

Licensing & Compatibility

  • License: GPL-3.0.
  • Compatibility: Free for commercial use and integration with closed-source software, subject to GPL-3.0 terms.

Limitations & Caveats

  • Requires a powerful computer for reasonable processing times; slower machines may require overnight processing.
  • Transcription quality is highly dependent on audio quality.
  • Known issues include potential AI text repetition loops on long files and experimental support for multilingual audio and overlapping speech. Non-verbal expressions are not transcribed.
Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
3
Issues (30d)
9
Star History
177 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.