Discover and explore top open-source AI tools and projects—updated daily.
istupakovLightweight ONNX-based Automatic Speech Recognition (ASR)
Top 97.6% on SourcePulse
Summary
onnx-asr is a lightweight Python package designed for efficient Automatic Speech Recognition (ASR) using ONNX models. It targets engineers and researchers seeking a fast, easy-to-use ASR solution with minimal dependencies, deployable across diverse hardware from edge devices to servers. The package simplifies the integration of various state-of-the-art ASR models into applications without requiring heavy deep learning frameworks.
How It Works
The package leverages ONNX Runtime for high-performance inference, abstracting away complex deep learning frameworks like PyTorch or Transformers. It supports a wide array of ONNX-exported ASR architectures, including NeMo, Kaldi, Vosk, GigaAM, and Whisper, by providing necessary preprocessors and decoders. This approach enables cross-platform compatibility and efficient execution on various hardware accelerators, including CPUs and GPUs.
Quick Start & Requirements
Installation is straightforward via pip: pip install onnx-asr[cpu,hub] for CPU or pip install onnx-asr[gpu,hub] for GPU acceleration. GPU usage requires a compatible CUDA/TensorRT setup and potentially pip install onnxruntime-gpu[cuda,cudnn] tensorrt-cu12-libs. The package supports Python 3.10-3.14 and NumPy 1.22.4-2.4+. A demo is available on Hugging Face Spaces.
Highlighted Details
onnx-asr boasts broad hardware support, running on x86 and Arm CPUs, and accelerating with CUDA, TensorRT, CoreML, ROCm, and DirectML. It handles batch processing, long-form audio via Voice Activity Detection (VAD), and can output token-level timestamps and log probabilities. Quantized models are supported for enhanced performance. A simple CLI and a Gradio web interface are also provided.
Maintenance & Community
The provided README does not contain specific details regarding notable contributors, sponsorships, or community channels like Discord or Slack.
Licensing & Compatibility
The project is released under the permissive MIT License, generally allowing for commercial use and integration into closed-source projects without significant restrictions.
Limitations & Caveats
A known issue exists with onnxruntime version 1.24 regarding symlinks in the Hugging Face cache; users may need an older version or to specify download paths. Most models have a 20-30 second audio limit, necessitating VAD for longer inputs. Supported WAV formats are limited to PCM variants; other audio types require pre-conversion or use of libraries like soundfile. Some older onnx-community models may have broken fp16 precision.
2 days ago
Inactive
antirez
moonshine-ai
KoljaB