whisper-node  by ariym

Node.js bindings for OpenAI's Whisper, enabling local transcription

created 2 years ago
284 stars

Top 93.1% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides Node.js bindings for OpenAI's Whisper, enabling local audio transcription using the C++ CPU-optimized version by ggerganov. It targets Node.js developers seeking efficient, on-device speech-to-text capabilities, offering benefits like data privacy and reduced latency.

How It Works

The library leverages the whisper.cpp project, a C++ implementation of OpenAI's Whisper model optimized for CPU execution, including Apple Silicon. It exposes this functionality through a Node.js interface, allowing developers to transcribe .wav files directly within their Node.js applications. The core advantage is the ability to run Whisper locally without relying on external APIs, making it suitable for privacy-sensitive applications or environments with limited internet connectivity.

Quick Start & Requirements

  • Install via npm: npm install whisper-node
  • For Windows: Install the make command.
  • Input files must be 16kHz .wav format. FFmpeg can be used for conversion (e.g., ffmpeg -i input.mp3 -ar 16000 output.wav).
  • Official documentation and examples are available in the README.

Highlighted Details

  • Supports various output formats: JSON (default), .txt, .srt, .vtt.
  • Offers word-level timestamp precision.
  • Allows specifying Whisper model names (e.g., base.en) or custom model paths.
  • Includes options for language detection and file generation flags.

Maintenance & Community

The project is maintained by Ariym. The roadmap indicates ongoing development, with plans to add support for config files, browser/WASM compatibility, automatic audio conversion, speaker diarization (Pyanote, WhisperX), and audio stream transcription.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

Currently, only .wav files at 16kHz are supported as input; automatic conversion from other formats is a planned feature. The project is still under active development, with several roadmap items indicating features not yet implemented.

Health Check
Last commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
13 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.