ASR toolkit for production-ready end-to-end speech recognition
Top 10.7% on sourcepulse
WeNet is a production-ready, end-to-end speech recognition toolkit designed for both streaming and non-streaming applications. It offers a full-stack solution for ASR development, targeting researchers and engineers who need accurate, lightweight, and well-documented tools for building and deploying speech recognition systems.
How It Works
WeNet integrates both Transformer and Conformer models, leveraging a hybrid approach that combines the strengths of different architectures for state-of-the-art accuracy. It supports WFST-based decoding for seamless Language Model integration and offers efficient runtime solutions for deployment.
Quick Start & Requirements
pip install git+https://github.com/wenet-e2e/wenet.git
conda create -n wenet python=3.10
), install sox
(conda install conda-forge::sox
), PyTorch (pip install torch==2.2.2+cu121 torchaudio==2.2.2+cu121 -f https://download.pytorch.org/whl/torch_stable.html
), and other dependencies (pip install -r requirements.txt
).sox
and libsox-dev
(Ubuntu/CentOS). Ascend NPU support requires CANN toolkit.cmake
3.14+.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The runtime build for x86 or LM integration requires manual compilation steps. Specific hardware acceleration (e.g., Ascend NPU) necessitates separate installation of vendor-specific toolkits and kernel drivers.
3 weeks ago
1 day