Fast-Powerful-Whisper-AI-Services-API  by Evil0ctal

API for local Whisper ASR/translation, supporting distributed deployment

created 9 months ago
394 stars

Top 74.2% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a high-performance, asynchronous API for Automatic Speech Recognition (ASR) and translation using locally run Whisper and Faster Whisper models. It targets developers and researchers needing scalable, distributed ASR solutions without relying on paid cloud APIs, offering features like multi-GPU support, social media media crawling, and future workflow customization.

How It Works

The API is built on Python's asyncio for efficient, concurrent request handling. It utilizes the Faster Whisper model for speed and accuracy, managing model instances through an asynchronous model pool that supports multi-GPU concurrency. Tasks are processed via a producer-consumer model, with support for SQLite and MySQL databases for task management and distributed deployment. An integrated crawler module fetches media from platforms like TikTok and Douyin, creating tasks directly from URLs.

Quick Start & Requirements

  • Install: Clone the repository and run pip install -r requirements.txt.
  • Prerequisites: Python 3.12 or >=3.8, FFmpeg, CUDA Toolkit (for GPU acceleration), PyTorch with CUDA support.
  • Run: Execute python3 start.py.
  • Docs: Access API documentation at http://127.0.0.1/.
  • Setup: Estimated setup time depends on FFmpeg and CUDA installation, typically 15-30 minutes.

Highlighted Details

  • Supports both Whisper and Faster Whisper models.
  • Integrated crawlers for TikTok and Douyin video processing.
  • Asynchronous model pool for efficient multi-GPU utilization.
  • Callback notifications for task completion.
  • Supports transcription and translation tasks.
  • Future plans for custom workflows and LLM integration (e.g., ChatGPT).

Maintenance & Community

The project is actively maintained by Evil0ctal. Community interaction is encouraged via GitHub issues.

Licensing & Compatibility

Licensed under Apache2.0. Commercial use and custom cooperation require contacting the maintainer via email.

Limitations & Caveats

Multi-GPU concurrency is unavailable on single-GPU setups. Workflow and event-driven features are marked as "Pending" and are not yet implemented.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
37 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.