SenseVoice.cpp  by lovemefan

C/C++ port of an audio foundation model

created 1 year ago
407 stars

Top 72.6% on sourcepulse

GitHubView on GitHub
Project Summary

SenseVoice.cpp is a C/C++ port of the Funasr Sense-voice model, offering audio understanding capabilities including ASR, LID, SER, and AEC/AED. It targets efficient on-device deployment with low inference latency, supporting multilingual ASR (Chinese, Cantonese, English, Japanese, Korean), emotion recognition, and event detection.

How It Works

Built on the ggml inference framework, SenseVoice.cpp minimizes external dependencies. Feature extraction references the kaldi-native-fbank library, with support for multi-threaded extraction. It incorporates flash attention decoding and offers quantization options (Q3, Q4, Q5, Q6, Q8) for optimized performance. The project supports CPU, Metal (Apple Silicon), BLAS, CUDA, and Vulkan backends, with experimental support for Ascend NPU.

Quick Start & Requirements

  • Install/Run: Clone the repository, download GGUF models from Hugging Face or ModelScope, or convert official models using provided scripts. Compile with cmake and make.
  • Prerequisites: git lfs, cmake, C++ compiler. Optional: libsdl2-dev for streaming.
  • Resources: Models are typically under 500MB. CPU inference is supported.
  • Links: Hugging Face, ModelScope

Highlighted Details

  • Low inference latency and on-device deployment focus.
  • Support for multiple audio understanding tasks beyond ASR.
  • Quantization options (Q3-Q8) for reduced memory footprint and faster inference.
  • Broad backend support including CPU, Metal, CUDA, and Vulkan.

Maintenance & Community

The project acknowledges inspiration and code borrowing from whisper.cpp, FunASR, and kaldi-native-fbank. The paraformer.cpp project is mentioned as a related effort that will continue to be updated.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is a port and may not perfectly replicate the original model's behavior or performance. Some backends (e.g., Ascend NPU) are marked as untested. The streaming example requires libsdl2-dev.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
74 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.