ten-vad by TEN-framework

Low-latency voice activity detection for real-time AI

Created 10 months ago

1,991 stars

Top 21.7% on SourcePulse

Project Summary

TEN VAD is a voice activity detector designed for real-time conversational AI, offering low latency and high performance. It targets developers building voice-enabled applications, providing superior accuracy and efficiency compared to common alternatives like WebRTC VAD and Silero VAD.

How It Works

TEN VAD employs a proprietary architecture optimized for temporal efficiency, enabling rapid speech-to-non-speech transition detection. This approach minimizes end-to-end latency in conversational AI systems and effectively handles short silences between speech segments, a common failure point for other VADs.

Quick Start & Requirements

Installation: git clone https://github.com/TEN-framework/ten-vad.git
Python Usage: pip install -r requirements.txt (for examples/plotting), pip install -U --force-reinstall -v git+https://github.com/TEN-framework/ten-vad.git (for direct use).
Dependencies: Python (3.8.19/3.10.14 verified), numpy, scipy, scikit-learn, matplotlib, torchaudio. ONNX usage requires onnxruntime >= 1.17.1. C/C++ usage requires Clang/Visual Studio/Xcode and CMake.
Platforms: Linux, Windows, macOS, Android, iOS, Web (WASM/JS).
Resources: Setup time varies by platform; core library size is lightweight (e.g., 306KB on Linux x64).
Demo: Hugging Face Space: https://github.com/user-attachments/assets/725a8318-d679-4b17-b9e4-e3dce999b298

Highlighted Details

Achieves superior precision-recall compared to WebRTC VAD and Silero VAD on benchmark datasets.
Demonstrates significantly lower latency in speech-to-non-speech transitions than Silero VAD.
Offers substantially lower computational complexity and smaller library size than Silero VAD across multiple platforms.
Provides cross-platform C compatibility and Python, JS (WASM), Android, and iOS bindings.

Maintenance & Community

Active development with recent updates integrating into k2-fsa/sherpa-onnx and releasing ONNX models.
Community channels: Discord, X, LinkedIn, Hugging Face.

Licensing & Compatibility

Licensed under Apache 2.0.
Includes code derived from LPCNet, which is BSD-2-Clause and BSD-3-Clause licensed (details in NOTICES file).

Limitations & Caveats

Requires resampling to 16kHz for audio inputs at other sampling rates.
The default threshold of 0.5 may require tuning for specific applications.
iOS usage requires manual framework embedding and device signature configuration in Xcode.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

3

Star History

62 stars in the last 30 days

Explore Similar Projects

onnx-asr by istupakov

Lightweight ONNX-based Automatic Speech Recognition (ASR)

Created 10 months ago

Updated 2 days ago

LingEcho-App by code-100-precent

An intelligent voice interaction platform for AI

Created 2 months ago

Updated 4 days ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

S.A.T.U.R.D.A.Y by GRVYDEV

Vocal computing toolbox for building voice interfaces to LLMs

Created 2 years ago

Updated 2 years ago

Starred by

Bryan Helmig

Bryan Helmig(Cofounder of Zapier).

hyprwhspr by goodroot

Native speech-to-text for system-wide dictation

Created 6 months ago

Updated 1 day ago

AIVoiceChat by KoljaB

Voice chat for low-latency AI companion interaction

Created 2 years ago

Updated 8 months ago

voquill by josiahsrc

Voice-powered productivity workspace

Created 5 months ago

Updated 23 hours ago

whisplay-ai-chatbot by PiSugar

Pocket AI assistant like a futuristic walkie-talkie

Created 9 months ago

Updated 2 days ago

Starred by

Jonathan Ragan-Kelley

Jonathan Ragan-Kelley(Professor at MIT) and

Dan Guido

Dan Guido(Cofounder of Trail of Bits).

voicemode by mbailey

Natural voice conversations for AI assistants

Created 8 months ago

Updated 1 day ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI) and

Long Ouyang

Long Ouyang(Research Scientist at OpenAI).

ElatoAI by akdeb

Realtime speech AI agents for ESP32 devices

Created 10 months ago

Updated 3 days ago

Starred by

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

1 more.

moonshine by moonshine-ai

Speech-to-text models optimized for fast, accurate ASR on edge devices

Created 1 year ago

Updated 2 days ago

sherpa-onnx by k2-fsa

Speech toolkit for local, offline speech AI tasks via ONNX

Created 3 years ago

Updated 1 day ago

Starred by

Chaoyu Yang

Chaoyu Yang(Founder of Bento),

Nir Gazit

Nir Gazit(Cofounder of Traceloop), and

4 more.

pipecat by pipecat-ai

Open-source framework for building real-time voice and multimodal conversational AI agents

Created 2 years ago

Updated 19 hours ago

Feedback? Help us improve.