api4sensevoice by 0x5446

API and websocket server for real-time streaming voice applications

Created 1 year ago

538 stars

Top 59.0% on SourcePulse

Project Summary

This project provides an API and WebSocket server for real-time speech processing, offering features like Voice Activity Detection (VAD), streaming transcription, and speaker verification. It targets developers building voice-enabled applications that require low-latency audio analysis and speaker identification.

How It Works

The server leverages the SenseVoice framework, integrating VAD for efficient audio processing and real-time streaming recognition. Speaker verification is achieved by comparing incoming audio against pre-registered voice samples, with recent optimizations focusing on accumulating audio data for improved accuracy and adding log-probabilities to confidence scores.

Quick Start & Requirements

Install dependencies using Conda and pip:

conda create -n api4sensevoice python=3.10
conda activate api4sensevoice
conda install -c conda-forge ffmpeg
pip install -r requirements.txt

Run the API server: python server.py --port <port_number>
Run the WebSocket server: python server_wss.py --port <port_number>
Speaker verification requires WAV audio files (16kHz, mono, 16-bit) placed in a speaker directory.
Official documentation and client testing page are available via links in the README.

Highlighted Details

Supports both REST API for single audio file transcription and WebSocket for real-time streaming.
Speaker verification can be enabled via a query parameter (sv=1) on the WebSocket endpoint.
Recent updates include optimized speaker verification and log-probability output for recognition confidence.
A roadmap indicates future plans for latency optimization.

Maintenance & Community

The project welcomes contributions and provides channels for bug reporting and feature requests. Specific community links (Discord/Slack) or social handles are not explicitly mentioned in the README.

Licensing & Compatibility

This project is licensed under the MIT License, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

The project is actively under development, with latency optimization listed as a future enhancement. The README does not specify hardware requirements beyond the need for ffmpeg.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

10 stars in the last 30 days