insanely-fast-whisper-api by JigsawStack

Fast audio transcription API

Created 1 year ago

332 stars

Top 82.7% on SourcePulse

Project Summary

This project provides a highly optimized API for audio transcription using OpenAI's Whisper Large v3 model, targeting developers and businesses needing fast, scalable, and deployable speech-to-text solutions. It offers features like speaker diarization, asynchronous task management, and robust concurrency, significantly reducing transcription times.

How It Works

The API leverages Hugging Face Transformers, Optimum, and flash-attn for accelerated inference. It employs fp16 precision, batching (up to 24 concurrent requests), and Flash Attention 2 for significant speedups. Speaker diarization is integrated via pyannote models, requiring Hugging Face authentication. The architecture is designed for high concurrency and parallel processing, making it suitable for production workloads.

Quick Start & Requirements

Install/Run: Deploy via Docker using yoeven/insanely-fast-whisper-api:latest or build from source. The README provides detailed instructions for Fly.io deployment (fly launch, fly secrets set ADMIN_KEY=<your_token> HF_TOKEN=<your_hf_key>).
Prerequisites: GPU (Nvidia A100 benchmarked), Docker, Fly.io CLI (for Fly deployment), Hugging Face token (for diarization).
Setup: Fly.io deployment with GPU and image pull can take time initially; subsequent deploys are faster. Local setup involves cloning, installing dependencies (including flash-attn with specific build requirements), and running uvicorn app.app:app.
Links: Fly.io GPU Service, Hugging Face Tokens, pyannote.audio.

Highlighted Details

Benchmarked ~2 minutes for 150 minutes of audio on an Nvidia A100 with optimizations.
Supports transcription, translation, language detection, and timestamp generation.
Includes task management, status checks, and cancellation for asynchronous jobs.
Offers optional admin authentication and webhook support for results.

Maintenance & Community

The project is part of JigsawStack, which offers managed APIs. The core code is based on the Insanely Fast Whisper CLI project by Vaibhav Srivastav. Community links are not explicitly provided in the README.

Licensing & Compatibility

The project is open source and deployable on any GPU cloud provider supporting Docker. Specific licensing details (e.g., MIT, Apache) are not explicitly stated in the README, but its open-source nature suggests broad compatibility for commercial use and closed-source linking.

Limitations & Caveats

The large Docker image size can lead to long initial deployment times. Speaker diarization requires accepting user conditions and providing a Hugging Face token. Fly.io machines may take up to 15 minutes to auto-shut down after idling, incurring costs if not manually stopped.

insanely-fast-whisper-api by JigsawStack

Explore Similar Projects

Auralis by astramind-ai

RuntimeSpeechRecognizer by gtreshchev

Fast-Powerful-Whisper-AI-Services-API by Evil0ctal

whisper-flow by dimastatz

obs-localvocal by royshil

Orpheus-FastAPI by Lex-au

use-whisper by chengsokdara

unmute by kyutai-labs

RealtimeVoiceChat by KoljaB

stt by jianchang512

insanely-fast-whisper by Vaibhavs10

WhisperLiveKit by QuentinFuxa