Discover and explore top open-source AI tools and projects—updated daily.
QuantatirskLocal Speech Recognition API Service
Top 96.6% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Quantatirsk/funasr-api provides a ready-to-use, local speech recognition API service powered by FunASR and Qwen3-ASR. It supports 52 languages and offers compatibility with both OpenAI API and Alibaba Cloud Speech API standards. This project benefits engineers and researchers by enabling local deployment of advanced, multi-language ASR capabilities with features like speaker diarization and real-time streaming.
How It Works
<2-4 sentences on core approach / design (key algorithms, models, data flow, or architectural choices) and why this approach is advantageous or novel.> The service integrates multiple state-of-the-art ASR models, including Qwen3-ASR (1.7B/0.6B) and Paraformer Large, leveraging vLLM for efficient Qwen3-ASR inference. It adopts familiar API interfaces (OpenAI, Alibaba Cloud) for seamless integration. Key features include CAM++ based speaker diarization, intelligent audio segmentation via VAD and greedy merging, and GPU batch processing for performance gains, making advanced ASR accessible locally.
Quick Start & Requirements
Primary install / run command (pip, Docker, binary, etc.).
Non-default prerequisites and dependencies (GPU, CUDA >= 12, Python 3.12, large dataset, API keys, OS, hardware, etc.).
Estimated setup time or resource footprint.
If they are present, include links to official quick-start, docs, demo, or other relevant pages.
Primary Install: Docker Deployment (Recommended).
Prerequisites: Python 3.10+, CUDA 12.6+ (for GPU acceleration), FFmpeg.
Resource Footprint: Minimum (CPU): 4 cores, 16GB RAM, 20GB disk. Recommended (GPU): 4 cores, 16GB RAM, NVIDIA GPU (16GB+ VRAM), 20GB disk.
Links:
docker-compose up -ddocker-compose -f docker-compose-cpu.yml up -dhttp://localhost:17003http://localhost:17003/docsHighlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
<1-3 sentences on caveats: unsupported platforms, missing features, alpha status, known bugs, breaking changes, bus factor, deprecation, etc. Avoid vague non-statements and judgments.> FunASR streaming does not support word-level timestamps or confidence scores. Qwen3 models require a GPU and vLLM backend; CPU environments automatically filter them. Word-level timestamps are exclusively available for Qwen3-ASR streaming.
1 day ago
Inactive
wq2012