whisper.api by innovatorved

Self-hostable API for speech-to-text transcription

created 2 years ago

896 stars

Top 41.3% on sourcepulse

Project Summary

This project provides a self-hostable API for speech-to-text transcription using a finetuned Whisper ASR model. It targets developers needing to integrate speech recognition into their applications, offering user-level access via API keys and optimized inference for efficient processing.

How It Works

The API leverages a finetuned and quantized Whisper ASR model for accurate and fast speech-to-text conversion. It exposes a simple HTTP interface, allowing users to upload audio files and receive transcriptions. The use of quantized models aims to reduce resource consumption and improve inference speed, making it suitable for self-hosting and integration.

Quick Start & Requirements

Install ffmpeg: sudo apt install ffmpeg
Install Python dependencies: pip install -r requirements.txt
Run the project: uvicorn app.main:app --reload
Get API token: POST request to /api/v1/users/get_token with email and password.
Transcribe audio: POST request to /api/v1/transcribe/ with model query parameter and audio file upload.
Available models include tiny.en and quantized versions like tiny.en.q5.
Documentation: https://github.com/innovatorved/whisper.api

Highlighted Details

Finetuned Whisper ASR model for enhanced accuracy.
Quantized model optimization for faster, efficient inference.
User-level access control with API keys.
Self-hostable architecture for full control.

Maintenance & Community

Primary author: Ved Gupta.
Support contact: vedgupta@protonmail.com.

Licensing & Compatibility

License: MIT.
Permissive license allows for commercial use and integration into closed-source applications.

Limitations & Caveats

The provided example for obtaining a token uses placeholder credentials, and users are instructed to use tokens provided by an admin, implying a managed deployment or setup process for token generation.

whisper.api by innovatorved

Explore Similar Projects

whisper.php by CodeWithKyrian

whisper.rn by mybigday

whisper-website by Kabanosk

Orpheus-FastAPI by Lex-au

whisper-ctranslate2 by Softcatala

WhisperLiveKit by QuentinFuxa

mlx-audio by Blaizzy

Verbi by PromtEngineer

Linly-Dubbing by Kedreamix

speechgpt by hahahumble

ecoute by SevaSk

FunASR by modelscope