Discover and explore top open-source AI tools and projects—updated daily.
QwenLMAdvanced multilingual speech recognition and alignment
New!
Top 25.2% on SourcePulse
Summary
Qwen3-ASR is an open-source ASR model family from Alibaba Cloud, offering robust multilingual speech, music, and song recognition. It features two all-in-one models (1.7B and 0.6B) supporting 52 languages/dialects and a novel non-autoregressive forced aligner for precise timestamp prediction. This suite delivers state-of-the-art open-source performance, competitive with commercial APIs, and advanced audio understanding capabilities.
How It Works
Built on large-scale speech data and the Qwen3-Omni foundation model, Qwen3-ASR employs two primary ASR models: the high-performance 1.7B version and the accuracy-efficient 0.6B version with high throughput. A key innovation is the Qwen3-ForcedAligner-0.6B, a non-autoregressive model providing superior timestamp accuracy for text-speech alignment across 11 languages. This architecture supports unified streaming and offline inference.
Quick Start & Requirements
Installation is via pip: pip install -U qwen-asr or pip install -U qwen-asr[vllm] for the vLLM backend. Python 3.12 is recommended. GPU acceleration is crucial; FlashAttention 2 (pip install -U flash-attn --no-build-isolation) is recommended for performance and memory efficiency, requiring compatible hardware and float16/bfloat16 dtypes. Official demos and examples are available on Hugging Face and ModelScope.
Highlighted Details
Maintenance & Community
Developed by Alibaba Cloud's Qwen team, with recent updates in January 2026. Community support is available via WeChat and Discord. Links to official blogs and demos are provided.
Licensing & Compatibility
The specific open-source license is not detailed in the README, necessitating further checks for commercial use or integration.
Limitations & Caveats
FlashAttention 2 has hardware and dtype prerequisites. The vLLM backend requires careful setup. Timestamp prediction relies on the separate Qwen3-ForcedAligner-0.6B model.
3 weeks ago
Inactive
oliverguhr
janhq
openai