insanely-fast-whisper-api  by JigsawStack

Fast audio transcription API

created 1 year ago
296 stars

Top 90.6% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a highly optimized API for audio transcription using OpenAI's Whisper Large v3 model, targeting developers and businesses needing fast, scalable, and deployable speech-to-text solutions. It offers features like speaker diarization, asynchronous task management, and robust concurrency, significantly reducing transcription times.

How It Works

The API leverages Hugging Face Transformers, Optimum, and flash-attn for accelerated inference. It employs fp16 precision, batching (up to 24 concurrent requests), and Flash Attention 2 for significant speedups. Speaker diarization is integrated via pyannote models, requiring Hugging Face authentication. The architecture is designed for high concurrency and parallel processing, making it suitable for production workloads.

Quick Start & Requirements

  • Install/Run: Deploy via Docker using yoeven/insanely-fast-whisper-api:latest or build from source. The README provides detailed instructions for Fly.io deployment (fly launch, fly secrets set ADMIN_KEY=<your_token> HF_TOKEN=<your_hf_key>).
  • Prerequisites: GPU (Nvidia A100 benchmarked), Docker, Fly.io CLI (for Fly deployment), Hugging Face token (for diarization).
  • Setup: Fly.io deployment with GPU and image pull can take time initially; subsequent deploys are faster. Local setup involves cloning, installing dependencies (including flash-attn with specific build requirements), and running uvicorn app.app:app.
  • Links: Fly.io GPU Service, Hugging Face Tokens, pyannote.audio.

Highlighted Details

  • Benchmarked ~2 minutes for 150 minutes of audio on an Nvidia A100 with optimizations.
  • Supports transcription, translation, language detection, and timestamp generation.
  • Includes task management, status checks, and cancellation for asynchronous jobs.
  • Offers optional admin authentication and webhook support for results.

Maintenance & Community

The project is part of JigsawStack, which offers managed APIs. The core code is based on the Insanely Fast Whisper CLI project by Vaibhav Srivastav. Community links are not explicitly provided in the README.

Licensing & Compatibility

The project is open source and deployable on any GPU cloud provider supporting Docker. Specific licensing details (e.g., MIT, Apache) are not explicitly stated in the README, but its open-source nature suggests broad compatibility for commercial use and closed-source linking.

Limitations & Caveats

The large Docker image size can lead to long initial deployment times. Speaker diarization requires accepting user conditions and providing a Hugging Face token. Fly.io machines may take up to 15 minutes to auto-shut down after idling, incurring costs if not manually stopped.

Health Check
Last commit

8 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
26 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.