sherpa-ncnn by k2-fsa

Offline STT engine for real-time speech recognition and VAD

Created 3 years ago

1,637 stars

Top 25.3% on SourcePulse

Project Summary

Sherpa-ncnn provides efficient, offline, real-time speech recognition and voice activity detection (VAD) for a wide range of devices and architectures. It targets developers building applications requiring on-device ASR and VAD, offering broad platform support and multiple language bindings.

How It Works

This project leverages the ncnn inference framework for optimized execution on diverse hardware, including CPUs and mobile platforms. It supports streaming speech-to-text and VAD, enabling real-time processing without internet connectivity. The architecture is designed for static linking, producing executables with minimal system dependencies beyond standard libraries.

Quick Start & Requirements

Installation and usage details are available at https://k2-fsa.github.io/sherpa/ncnn/index.html.
Pre-trained models can be found at https://github.com/k2-fsa/sherpa-ncnn/releases/tag/models.
Supports compilation from source for static linking.

Highlighted Details

Supports x86, x86_64, ARM (32/64-bit), and RISC-V (64-bit) architectures.
Available for Linux, macOS, Windows, Android, iOS, and WebAssembly.
Provides APIs for C++, C, Python, JavaScript, Go, C#, Kotlin, and Swift.
Does not depend on PyTorch or other heavy inference frameworks, relying solely on ncnn.

Maintenance & Community

Community channels are listed at https://k2-fsa.github.io/sherpa/social-groups.html.
Related projects include sherpa-onnx and the core sherpa library.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project's licensing is not clearly stated in the README, which may impact commercial adoption. Specific performance benchmarks or detailed resource requirements for various platforms are not provided.

sherpa-ncnn by k2-fsa

Explore Similar Projects

curses by mmpneo

Auralis by astramind-ai

RuntimeSpeechRecognizer by gtreshchev

realtime-transcription-fastrtc by sofdog-gh

obs-localvocal by royshil

xtts-api-server by daswer123

LanguageLeapAI by SociallyIneptWeeb

RealtimeSTT by KoljaB

piper by rhasspy

wenet by wenet-e2e

sherpa-onnx by k2-fsa

speech_recognition by Uberi