C/C++ port of an audio foundation model
Top 72.6% on sourcepulse
SenseVoice.cpp is a C/C++ port of the Funasr Sense-voice model, offering audio understanding capabilities including ASR, LID, SER, and AEC/AED. It targets efficient on-device deployment with low inference latency, supporting multilingual ASR (Chinese, Cantonese, English, Japanese, Korean), emotion recognition, and event detection.
How It Works
Built on the ggml inference framework, SenseVoice.cpp minimizes external dependencies. Feature extraction references the kaldi-native-fbank library, with support for multi-threaded extraction. It incorporates flash attention decoding and offers quantization options (Q3, Q4, Q5, Q6, Q8) for optimized performance. The project supports CPU, Metal (Apple Silicon), BLAS, CUDA, and Vulkan backends, with experimental support for Ascend NPU.
Quick Start & Requirements
cmake
and make
.git lfs
, cmake
, C++ compiler. Optional: libsdl2-dev
for streaming.Highlighted Details
Maintenance & Community
The project acknowledges inspiration and code borrowing from whisper.cpp
, FunASR
, and kaldi-native-fbank
. The paraformer.cpp
project is mentioned as a related effort that will continue to be updated.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is a port and may not perfectly replicate the original model's behavior or performance. Some backends (e.g., Ascend NPU) are marked as untested. The streaming example requires libsdl2-dev
.
1 month ago
1 day