qwen-asr  by antirez

Pure C ASR inference engine

Created 1 week ago

New!

388 stars

Top 74.1% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This project provides a pure C inference engine for Qwen3-ASR speech-to-text models (0.6B and 1.7B parameters). It targets engineers and power users needing efficient, low-dependency ASR on diverse hardware, particularly Linux servers, offering high-speed transcription even on modest CPUs.

How It Works

The core is a C implementation of the Qwen3-ASR inference pipeline, requiring only a standard C library and a BLAS implementation (Accelerate on macOS, OpenBLAS on Linux). It supports both offline and streaming transcription modes, outputting tokens directly to stdout. The design prioritizes CPU performance and rapid model loading via memory-mapped safetensors, while explicitly omitting MPS support to focus on broader server deployment.

Quick Start & Requirements

Build with make blas. Download models using ./download_model.sh. Transcribe audio via ./qwen_asr -d <model_dir> -i <audio_file> or pipe audio from ffmpeg using --stdin. Key requirements include a C compiler and a BLAS library (OpenBLAS or Accelerate). MPS (Apple Silicon GPU) support is not included.

Highlighted Details

  • Minimal Dependencies: Pure C with only standard library and BLAS.
  • Model Support: Fully supports both 0.6B and 1.7B Qwen3-ASR variants.
  • Flexible Modes: Offers normal (full offline decode) and streaming (chunked processing with rollback) transcription.
  • Real-time Output: Tokens stream to stdout as generated, enabling incremental transcription.
  • Input Flexibility: Accepts audio via stdin, easily integrated with tools like ffmpeg.
  • Fast Loading: Utilizes memory-mapped safetensors for near-instantaneous model loading.
  • Language Control: Auto-detects language or allows explicit setting via --language.
  • Prompt Biasing: System prompts (--prompt) can subtly influence model output for specific terms.
  • Diagnostics: Includes a --monitor mode for real-time pipeline visualization.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmap were found in the provided README content.

Licensing & Compatibility

The project is released under the MIT license, permitting broad use, including commercial applications. However, it explicitly excludes support for Apple's MPS (Metal Performance Shaders) framework.

Limitations & Caveats

MPS support is intentionally omitted, requiring users to fork the repository for Apple Silicon GPU acceleration. The streaming mode prioritizes incremental stability over raw throughput for prerecorded files, potentially leading to slower overall processing compared to offline modes in such scenarios. Prompt biasing is noted as a subtle influence rather than a strict command.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
3
Star History
389 stars in the last 13 days

Explore Similar Projects

Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
3 more.

voxtral.c by antirez

5.3%
1k
Pure C speech-to-text inference engine for Mistral Voxtral Realtime 4B
Created 2 weeks ago
Updated 1 week ago
Feedback? Help us improve.