whisper-asr-webservice  by ahmetoner

ASR webservice API for speech recognition, translation, and language ID

created 2 years ago
2,789 stars

Top 17.5% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a versatile webservice for Automatic Speech Recognition (ASR) using OpenAI's Whisper models and its derivatives. It targets developers and researchers needing to integrate speech-to-text capabilities into applications, offering multiple engine options, output formats, and advanced features like speaker diarization and VAD filtering.

How It Works

The service exposes a REST API built with Python, allowing users to submit audio files for transcription. It supports multiple ASR engines (OpenAI Whisper, Faster Whisper, WhisperX) and offers configurable model loading with an idle timeout to manage GPU memory. FFmpeg integration ensures broad audio/video format compatibility.

Quick Start & Requirements

  • Install/Run: Docker is the primary method.
    • CPU: docker run -d -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest
    • GPU: docker run -d --gpus all -p 9000:9000 -e ASR_MODEL=base -e ASR_ENGINE=openai_whisper onerahmet/openai-whisper-asr-webservice:latest-gpu
  • Prerequisites: Docker, NVIDIA Container Toolkit (for GPU).
  • Cache: Persist models using -v $PWD/cache:/root/.cache/.
  • Docs: https://ahmetoner.github.io/whisper-asr-webservice

Highlighted Details

  • Supports OpenAI Whisper, Faster Whisper, and WhisperX engines.
  • Offers multiple output formats: text, JSON, VTT, SRT, TSV.
  • Includes word-level timestamps, VAD filtering, and speaker diarization (via WhisperX).
  • Features GPU acceleration and FFmpeg integration.
  • REST API with Swagger UI for easy interaction.

Maintenance & Community

No specific community channels or notable contributors are mentioned in the README.

Licensing & Compatibility

The project uses libraries from the FFmpeg project under LGPLv2.1. The specific license for the core webservice is not explicitly stated in the README, which may require clarification for commercial use.

Limitations & Caveats

The README does not explicitly state the license for the core webservice, which could be a concern for commercial adoption. Model loading and unloading behavior, especially concerning memory management and potential race conditions, is not detailed.

Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
2
Issues (30d)
7
Star History
244 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.