sherpa-onnx  by k2-fsa

Speech toolkit for local, offline speech AI tasks via ONNX

created 2 years ago
6,855 stars

Top 7.5% on sourcepulse

GitHubView on GitHub
Project Summary

Sherpa-ONNX provides a comprehensive, offline-capable suite of speech and audio processing tools, including speech-to-text (ASR), text-to-speech (TTS), speaker diarization, and more. It targets developers and researchers needing robust, cross-platform audio AI capabilities, offering significant benefits in terms of privacy, low latency, and broad hardware compatibility.

How It Works

The project leverages ONNX Runtime for efficient, hardware-accelerated inference of neural network models. It supports a wide array of state-of-the-art models, including Zipformer, Paraformer, Whisper, and NeMo, enabling diverse speech tasks. The architecture is designed for flexibility, allowing integration into various applications and embedded systems.

Quick Start & Requirements

  • Installation: Primarily via pip for Python, with pre-built binaries and SDKs for C++, Android, iOS, and WebAssembly.
  • Dependencies: ONNX Runtime, various Python libraries (e.g., soundfile, numpy), and potentially CUDA for GPU acceleration. Specific model requirements vary.
  • Resources: Model sizes vary; smaller models are suitable for embedded systems like Raspberry Pi, while larger models may require more powerful hardware.
  • Links: Documentation: https://k2-fsa.github.io/sherpa/onnx/

Highlighted Details

  • Supports 12 programming languages (C++, C, Python, JavaScript, Java, C#, Kotlin, Swift, Go, Dart, Rust, Pascal).
  • Extensive platform support including x86, ARM (32/64-bit), RISC-V, Android, iOS, Windows, macOS, Linux, and WebAssembly.
  • Offers both streaming and non-streaming ASR, TTS, speaker diarization, VAD, audio tagging, and keyword spotting.
  • Provides numerous pre-trained models for various languages and tasks, with links to Hugging Face Spaces and downloadable archives.

Maintenance & Community

The project is actively maintained by the k2-fsa team. Community engagement is encouraged via WeChat and QQ groups, with links provided in the documentation.

Licensing & Compatibility

The project appears to be primarily licensed under Apache 2.0, facilitating commercial use and integration into closed-source applications. Specific model licenses may vary.

Limitations & Caveats

While extensive, the sheer number of models and configurations means users must carefully select and manage dependencies for their specific use case. Some advanced features or specific model integrations might still be under active development.

Health Check
Last commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)
53
Issues (30d)
50
Star History
1,091 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

MiniCPM-o by OpenBMB

0.2%
20k
MLLM for vision, speech, and multimodal live streaming on your phone
created 1 year ago
updated 1 month ago
Feedback? Help us improve.