qwen-asr by antirez

Pure C ASR inference engine

Created 3 months ago

553 stars

Top 57.3% on SourcePulse

Project Summary

Summary

This project provides a pure C inference engine for Qwen3-ASR speech-to-text models (0.6B and 1.7B parameters). It targets engineers and power users needing efficient, low-dependency ASR on diverse hardware, particularly Linux servers, offering high-speed transcription even on modest CPUs.

How It Works

The core is a C implementation of the Qwen3-ASR inference pipeline, requiring only a standard C library and a BLAS implementation (Accelerate on macOS, OpenBLAS on Linux). It supports both offline and streaming transcription modes, outputting tokens directly to stdout. The design prioritizes CPU performance and rapid model loading via memory-mapped safetensors, while explicitly omitting MPS support to focus on broader server deployment.

Quick Start & Requirements

Build with make blas. Download models using ./download_model.sh. Transcribe audio via ./qwen_asr -d <model_dir> -i <audio_file> or pipe audio from ffmpeg using --stdin. Key requirements include a C compiler and a BLAS library (OpenBLAS or Accelerate). MPS (Apple Silicon GPU) support is not included.

Highlighted Details

Minimal Dependencies: Pure C with only standard library and BLAS.
Model Support: Fully supports both 0.6B and 1.7B Qwen3-ASR variants.
Flexible Modes: Offers normal (full offline decode) and streaming (chunked processing with rollback) transcription.
Real-time Output: Tokens stream to stdout as generated, enabling incremental transcription.
Input Flexibility: Accepts audio via stdin, easily integrated with tools like ffmpeg.
Fast Loading: Utilizes memory-mapped safetensors for near-instantaneous model loading.
Language Control: Auto-detects language or allows explicit setting via --language.
Prompt Biasing: System prompts (--prompt) can subtly influence model output for specific terms.
Diagnostics: Includes a --monitor mode for real-time pipeline visualization.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmap were found in the provided README content.

Licensing & Compatibility

The project is released under the MIT license, permitting broad use, including commercial applications. However, it explicitly excludes support for Apple's MPS (Metal Performance Shaders) framework.

Limitations & Caveats

MPS support is intentionally omitted, requiring users to fork the repository for Apple Silicon GPU acceleration. The streaming mode prioritizes incremental stability over raw throughput for prerecorded files, potentially leading to slower overall processing compared to offline modes in such scenarios. Prompt biasing is noted as a subtle influence rather than a strict command.

qwen-asr by antirez

Explore Similar Projects

whisper.el by natrys

LiveTranslate by TheDeathDragon

csm-mlx by senstella

TheWhisper by TheStageAI

stream-translator by fortypercnt

soprano by ekwek1

hibiki by kyutai-labs

voxtral.c by antirez

whisper.net by sandrohanea

faster-qwen3-tts by andimarafioti

wenet by wenet-e2e

whisper.cpp by ggml-org