API and websocket server for real-time streaming voice applications
Top 64.7% on sourcepulse
This project provides an API and WebSocket server for real-time speech processing, offering features like Voice Activity Detection (VAD), streaming transcription, and speaker verification. It targets developers building voice-enabled applications that require low-latency audio analysis and speaker identification.
How It Works
The server leverages the SenseVoice framework, integrating VAD for efficient audio processing and real-time streaming recognition. Speaker verification is achieved by comparing incoming audio against pre-registered voice samples, with recent optimizations focusing on accumulating audio data for improved accuracy and adding log-probabilities to confidence scores.
Quick Start & Requirements
conda create -n api4sensevoice python=3.10
conda activate api4sensevoice
conda install -c conda-forge ffmpeg
pip install -r requirements.txt
python server.py --port <port_number>
python server_wss.py --port <port_number>
speaker
directory.Highlighted Details
sv=1
) on the WebSocket endpoint.Maintenance & Community
The project welcomes contributions and provides channels for bug reporting and feature requests. Specific community links (Discord/Slack) or social handles are not explicitly mentioned in the README.
Licensing & Compatibility
This project is licensed under the MIT License, permitting commercial use and integration with closed-source applications.
Limitations & Caveats
The project is actively under development, with latency optimization listed as a future enhancement. The README does not specify hardware requirements beyond the need for ffmpeg
.
9 months ago
Inactive