SenseVoice.cpp by lovemefan

C/C++ port of an audio foundation model

Created 1 year ago

518 stars

Top 60.7% on SourcePulse

Project Summary

SenseVoice.cpp is a C/C++ port of the Funasr Sense-voice model, offering audio understanding capabilities including ASR, LID, SER, and AEC/AED. It targets efficient on-device deployment with low inference latency, supporting multilingual ASR (Chinese, Cantonese, English, Japanese, Korean), emotion recognition, and event detection.

How It Works

Built on the ggml inference framework, SenseVoice.cpp minimizes external dependencies. Feature extraction references the kaldi-native-fbank library, with support for multi-threaded extraction. It incorporates flash attention decoding and offers quantization options (Q3, Q4, Q5, Q6, Q8) for optimized performance. The project supports CPU, Metal (Apple Silicon), BLAS, CUDA, and Vulkan backends, with experimental support for Ascend NPU.

Quick Start & Requirements

Install/Run: Clone the repository, download GGUF models from Hugging Face or ModelScope, or convert official models using provided scripts. Compile with cmake and make.
Prerequisites: git lfs, cmake, C++ compiler. Optional: libsdl2-dev for streaming.
Resources: Models are typically under 500MB. CPU inference is supported.
Links: Hugging Face, ModelScope

Highlighted Details

Low inference latency and on-device deployment focus.
Support for multiple audio understanding tasks beyond ASR.
Quantization options (Q3-Q8) for reduced memory footprint and faster inference.
Broad backend support including CPU, Metal, CUDA, and Vulkan.

Maintenance & Community

The project acknowledges inspiration and code borrowing from whisper.cpp, FunASR, and kaldi-native-fbank. The paraformer.cpp project is mentioned as a related effort that will continue to be updated.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is a port and may not perfectly replicate the original model's behavior or performance. Some backends (e.g., Ascend NPU) are marked as untested. The streaming example requires libsdl2-dev.

SenseVoice.cpp by lovemefan

Explore Similar Projects

Dolphin by DataoceanAI

RapidASR by RapidAI

zamia-speech by gooofy

pyctcdecode by kensho-technologies

Qwen2-Audio by QwenLM

athena by athena-team

Kimi-Audio by MoonshotAI

icefall by k2-fsa

SenseVoice by FunAudioLLM

sherpa-onnx by k2-fsa

FunASR by modelscope

PaddleSpeech by PaddlePaddle