whisper-asr-webservice by ahmetoner

ASR webservice API for speech recognition, translation, and language ID

Created 3 years ago

3,096 stars

Top 15.4% on SourcePulse

Project Summary

This project provides a versatile webservice for Automatic Speech Recognition (ASR) using OpenAI's Whisper models and its derivatives. It targets developers and researchers needing to integrate speech-to-text capabilities into applications, offering multiple engine options, output formats, and advanced features like speaker diarization and VAD filtering.

How It Works

The service exposes a REST API built with Python, allowing users to submit audio files for transcription. It supports multiple ASR engines (OpenAI Whisper, Faster Whisper, WhisperX) and offers configurable model loading with an idle timeout to manage GPU memory. FFmpeg integration ensures broad audio/video format compatibility.

Quick Start & Requirements

Install/Run: Docker is the primary method.
- CPU: docker run -d -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest
- GPU: docker run -d --gpus all -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest-gpu
Prerequisites: Docker, NVIDIA Container Toolkit (for GPU).
Cache: Persist models using -v $PWD/cache:/root/.cache/.
Docs: https://ahmetoner.github.io/whisper-asr-webservice

Highlighted Details

Supports OpenAI Whisper, Faster Whisper, and WhisperX engines.
Offers multiple output formats: text, JSON, VTT, SRT, TSV.
Includes word-level timestamps, VAD filtering, and speaker diarization (via WhisperX).
Features GPU acceleration and FFmpeg integration.
REST API with Swagger UI for easy interaction.

Maintenance & Community

No specific community channels or notable contributors are mentioned in the README.

Licensing & Compatibility

The project uses libraries from the FFmpeg project under LGPLv2.1. The specific license for the core webservice is not explicitly stated in the README, which may require clarification for commercial use.

Limitations & Caveats

The README does not explicitly state the license for the core webservice, which could be a concern for commercial adoption. Model loading and unloading behavior, especially concerning memory management and potential race conditions, is not detailed.

whisper-asr-webservice by ahmetoner

Explore Similar Projects

insanely-fast-whisper-cli by ochen1

Auralis by astramind-ai

realtime-transcription-fastrtc by sofdog-gh

Scriberr by rishikanthc

tts by zuoban

Whisper-WebUI by jhj0517

stt by jianchang512

Kokoro-FastAPI by remsky

WhisperLiveKit by QuentinFuxa

sherpa-onnx by k2-fsa

wenet by wenet-e2e

tortoise-tts by neonbjb