Multilingual speech model for understanding voice
Top 8.3% on sourcepulse
SenseVoice is a multilingual speech foundation model offering Automatic Speech Recognition (ASR), Spoken Language Identification (LID), Speech Emotion Recognition (SER), and Audio Event Detection (AED). It targets developers and researchers needing high-accuracy, low-latency speech processing across multiple languages, providing a significant performance uplift over models like Whisper.
How It Works
SenseVoice employs a non-autoregressive end-to-end framework for efficient inference. It is trained on over 400,000 hours of multilingual data, enabling robust performance across its diverse speech understanding capabilities. The model architecture is designed for low latency, making it suitable for real-time applications.
Quick Start & Requirements
pip install -r requirements.txt
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
streaming-sensevoice
sacrifices some accuracy for lower latency.1 month ago
1 week