Fast-Powerful-Whisper-AI-Services-API by Evil0ctal

API for local Whisper ASR/translation, supporting distributed deployment

Created 1 year ago

442 stars

Top 67.6% on SourcePulse

Project Summary

This project provides a high-performance, asynchronous API for Automatic Speech Recognition (ASR) and translation using locally run Whisper and Faster Whisper models. It targets developers and researchers needing scalable, distributed ASR solutions without relying on paid cloud APIs, offering features like multi-GPU support, social media media crawling, and future workflow customization.

How It Works

The API is built on Python's asyncio for efficient, concurrent request handling. It utilizes the Faster Whisper model for speed and accuracy, managing model instances through an asynchronous model pool that supports multi-GPU concurrency. Tasks are processed via a producer-consumer model, with support for SQLite and MySQL databases for task management and distributed deployment. An integrated crawler module fetches media from platforms like TikTok and Douyin, creating tasks directly from URLs.

Quick Start & Requirements

Install: Clone the repository and run pip install -r requirements.txt.
Prerequisites: Python 3.12 or >=3.8, FFmpeg, CUDA Toolkit (for GPU acceleration), PyTorch with CUDA support.
Run: Execute python3 start.py.
Docs: Access API documentation at http://127.0.0.1/.
Setup: Estimated setup time depends on FFmpeg and CUDA installation, typically 15-30 minutes.

Highlighted Details

Supports both Whisper and Faster Whisper models.
Integrated crawlers for TikTok and Douyin video processing.
Asynchronous model pool for efficient multi-GPU utilization.
Callback notifications for task completion.
Supports transcription and translation tasks.
Future plans for custom workflows and LLM integration (e.g., ChatGPT).

Maintenance & Community

The project is actively maintained by Evil0ctal. Community interaction is encouraged via GitHub issues.

Licensing & Compatibility

Licensed under Apache2.0. Commercial use and custom cooperation require contacting the maintainer via email.

Limitations & Caveats

Multi-GPU concurrency is unavailable on single-GPU setups. Workflow and event-driven features are marked as "Pending" and are not yet implemented.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

6 stars in the last 30 days

Explore Similar Projects

openai-chat-api-workflow by yohasebe

Alfred workflow for OpenAI Chat API access

Created 3 years ago

Updated 6 days ago

ata by transformrs

CLI tool for multimodal AI in the terminal

Created 3 years ago

Updated 9 months ago

Starred by

Jason Miller

Jason Miller(Author of Preact).

swama by Trans-N-ai

High-performance LLM inference engine for macOS

Created 7 months ago

Updated 3 days ago

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs),

Matt Schrage

Matt Schrage(Cofounder of Fig), and

2 more.

talk by yacineMTB

Local conversational engine demo with audio

Created 2 years ago

Updated 2 years ago

RuntimeSpeechRecognizer by gtreshchev

Unreal Engine plugin for real-time, offline speech recognition

Created 2 years ago

Updated 10 months ago

insanely-fast-whisper-api by JigsawStack

Fast audio transcription API

Created 1 year ago

Updated 1 year ago

voice-chat-ai by bigsk1

Voice chat app for interacting with AI characters using speech

Created 1 year ago

Updated 5 days ago

voice-chat-pdf by run-llama

Next.js voice chat app for document interaction

Created 1 year ago

Updated 1 year ago

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

1 more.

WhisperFusion by collabora

AI pipeline for real-time conversations

Created 2 years ago

Updated 1 year ago

LongtermChatExternalSources by daveshap

GPT-3 chatbot with long-term memory and external sources

Created 3 years ago

Updated 2 years ago

chat_gpt_sdk by redevrx

Flutter SDK for OpenAI APIs

Created 3 years ago

Updated 3 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

RealtimeVoiceChat by KoljaB

Real-time voice chat with AI using streaming audio

Created 8 months ago

Updated 6 months ago

Feedback? Help us improve.