API for local Whisper ASR/translation, supporting distributed deployment
Top 74.2% on sourcepulse
This project provides a high-performance, asynchronous API for Automatic Speech Recognition (ASR) and translation using locally run Whisper and Faster Whisper models. It targets developers and researchers needing scalable, distributed ASR solutions without relying on paid cloud APIs, offering features like multi-GPU support, social media media crawling, and future workflow customization.
How It Works
The API is built on Python's asyncio
for efficient, concurrent request handling. It utilizes the Faster Whisper model for speed and accuracy, managing model instances through an asynchronous model pool that supports multi-GPU concurrency. Tasks are processed via a producer-consumer model, with support for SQLite and MySQL databases for task management and distributed deployment. An integrated crawler module fetches media from platforms like TikTok and Douyin, creating tasks directly from URLs.
Quick Start & Requirements
pip install -r requirements.txt
.python3 start.py
.http://127.0.0.1/
.Highlighted Details
Maintenance & Community
The project is actively maintained by Evil0ctal. Community interaction is encouraged via GitHub issues.
Licensing & Compatibility
Licensed under Apache2.0. Commercial use and custom cooperation require contacting the maintainer via email.
Limitations & Caveats
Multi-GPU concurrency is unavailable on single-GPU setups. Workflow and event-driven features are marked as "Pending" and are not yet implemented.
1 month ago
1 day