sherpa-onnx by k2-fsa

Speech toolkit for local, offline speech AI tasks via ONNX

Created 3 years ago

9,670 stars

Top 5.2% on SourcePulse

Project Summary

Sherpa-ONNX provides a comprehensive, offline-capable suite of speech and audio processing tools, including speech-to-text (ASR), text-to-speech (TTS), speaker diarization, and more. It targets developers and researchers needing robust, cross-platform audio AI capabilities, offering significant benefits in terms of privacy, low latency, and broad hardware compatibility.

How It Works

The project leverages ONNX Runtime for efficient, hardware-accelerated inference of neural network models. It supports a wide array of state-of-the-art models, including Zipformer, Paraformer, Whisper, and NeMo, enabling diverse speech tasks. The architecture is designed for flexibility, allowing integration into various applications and embedded systems.

Quick Start & Requirements

Installation: Primarily via pip for Python, with pre-built binaries and SDKs for C++, Android, iOS, and WebAssembly.
Dependencies: ONNX Runtime, various Python libraries (e.g., soundfile, numpy), and potentially CUDA for GPU acceleration. Specific model requirements vary.
Resources: Model sizes vary; smaller models are suitable for embedded systems like Raspberry Pi, while larger models may require more powerful hardware.
Links: Documentation: https://k2-fsa.github.io/sherpa/onnx/

Highlighted Details

Supports 12 programming languages (C++, C, Python, JavaScript, Java, C#, Kotlin, Swift, Go, Dart, Rust, Pascal).
Extensive platform support including x86, ARM (32/64-bit), RISC-V, Android, iOS, Windows, macOS, Linux, and WebAssembly.
Offers both streaming and non-streaming ASR, TTS, speaker diarization, VAD, audio tagging, and keyword spotting.
Provides numerous pre-trained models for various languages and tasks, with links to Hugging Face Spaces and downloadable archives.

Maintenance & Community

The project is actively maintained by the k2-fsa team. Community engagement is encouraged via WeChat and QQ groups, with links provided in the documentation.

Licensing & Compatibility

The project appears to be primarily licensed under Apache 2.0, facilitating commercial use and integration into closed-source applications. Specific model licenses may vary.

Limitations & Caveats

While extensive, the sheer number of models and configurations means users must carefully select and manage dependencies for their specific use case. Some advanced features or specific model integrations might still be under active development.

sherpa-onnx by k2-fsa

Explore Similar Projects

praises by ElmTran

alibabacloud-bailian-speech-demo by aliyun

LiveWhisper by Nikorasu

AIVoiceChat by KoljaB

pywhispercpp by absadiki

ollama-voice-mac by apeatling

fast-voice-assistant by dsa

speech_course by yandexdataschool

Easy-Voice-Toolkit by Spr-Aachen

ASR-LLM-TTS by ABexit

sherpa-ncnn by k2-fsa

speech_recognition by Uberi