Self-hostable API for speech-to-text transcription
Top 41.3% on sourcepulse
This project provides a self-hostable API for speech-to-text transcription using a finetuned Whisper ASR model. It targets developers needing to integrate speech recognition into their applications, offering user-level access via API keys and optimized inference for efficient processing.
How It Works
The API leverages a finetuned and quantized Whisper ASR model for accurate and fast speech-to-text conversion. It exposes a simple HTTP interface, allowing users to upload audio files and receive transcriptions. The use of quantized models aims to reduce resource consumption and improve inference speed, making it suitable for self-hosting and integration.
Quick Start & Requirements
ffmpeg
: sudo apt install ffmpeg
pip install -r requirements.txt
uvicorn app.main:app --reload
/api/v1/users/get_token
with email and password./api/v1/transcribe/
with model
query parameter and audio file upload.tiny.en
and quantized versions like tiny.en.q5
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The provided example for obtaining a token uses placeholder credentials, and users are instructed to use tokens provided by an admin, implying a managed deployment or setup process for token generation.
1 year ago
1 day