ASR webservice API for speech recognition, translation, and language ID
Top 17.5% on sourcepulse
This project provides a versatile webservice for Automatic Speech Recognition (ASR) using OpenAI's Whisper models and its derivatives. It targets developers and researchers needing to integrate speech-to-text capabilities into applications, offering multiple engine options, output formats, and advanced features like speaker diarization and VAD filtering.
How It Works
The service exposes a REST API built with Python, allowing users to submit audio files for transcription. It supports multiple ASR engines (OpenAI Whisper, Faster Whisper, WhisperX) and offers configurable model loading with an idle timeout to manage GPU memory. FFmpeg integration ensures broad audio/video format compatibility.
Quick Start & Requirements
docker run -d -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest
docker run -d --gpus all -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest-gpu
-v $PWD/cache:/root/.cache/
.Highlighted Details
Maintenance & Community
No specific community channels or notable contributors are mentioned in the README.
Licensing & Compatibility
The project uses libraries from the FFmpeg project under LGPLv2.1. The specific license for the core webservice is not explicitly stated in the README, which may require clarification for commercial use.
Limitations & Caveats
The README does not explicitly state the license for the core webservice, which could be a concern for commercial adoption. Model loading and unloading behavior, especially concerning memory management and potential race conditions, is not detailed.
1 month ago
1 week