Transcription app for journalists using OpenAI's Whisper ASR
Top 98.8% on sourcepulse
Stage-Whisper provides a free, secure, and user-friendly transcription application for journalists and other users, leveraging OpenAI's Whisper ASR model for accurate speech-to-text conversion. It offers a graphical interface for managing and editing transcriptions, aiming to democratize access to advanced ASR technology for less technical users.
How It Works
The application comprises a Python backend that interfaces with OpenAI's Whisper library and a Node/Electron-based frontend for the user interface. This architecture separates the core ASR processing from the user interaction layer, allowing for independent development and potential scalability. The backend handles audio processing and transcription, while the frontend provides a GUI for file management, editing, and user interaction.
Quick Start & Requirements
To develop Stage Whisper, users need Node, Yarn, Python 3.x, Rust, ffmpeg, and Poetry installed. A sample installation command for macOS is provided. The backend can be run standalone using poetry run python stagewhisper --input /path/to/audio/file.mp3
after poetry install
in the backend
directory. The Electron interface can be started with yarn
and yarn dev
in the electron
directory.
Highlighted Details
Maintenance & Community
The project is led by @PeterSterne, @filmgirl, @HarrisLapiroff, and @Crazy4Pi314, with @oenu leading frontend development. Feature requests and bug reports can be submitted via GitHub issues and discussions. A Discord server is available for project discussions.
Licensing & Compatibility
Stage Whisper-specific code is MIT licensed. OpenAI's Whisper is MIT licensed. The project will adhere to all licensing terms, including those for dependencies like FFmpeg.
Limitations & Caveats
The project is in the early stages of development, with a beta version planned for release soon. While a working prototype exists, users currently need to install multiple development dependencies.
2 years ago
1 day