FireRedVAD by FireRedTeam

Industrial-grade VAD and AED for diverse audio events

Created 4 months ago

457 stars

Top 65.3% on SourcePulse

Project Summary

A state-of-the-art, industrial-grade solution for Voice Activity Detection (VAD) and Audio Event Detection (AED), FireRedVAD offers robust performance across 100+ languages. It targets engineers and researchers seeking to significantly outperform existing VAD tools, enabling advanced audio analysis applications with high accuracy and broad multilingual support.

How It Works

The system utilizes a DFSMN-based architecture for both non-streaming and streaming VAD, complemented by non-streaming AED capabilities. This deep learning approach excels at identifying speech, singing, and music segments with high precision. Its key advantages include state-of-the-art performance metrics and extensive multilingual support, positioning it as a versatile tool for diverse audio processing requirements.

Quick Start & Requirements

Installation is available via pip (pip install fireredvad or pip install fireredvad[gpu]) or from source. Prerequisites include Python 3.10+, PyTorch (GPU support optional), and ffmpeg for audio format conversion. Models can be downloaded from ModelScope or Hugging Face. Audio inputs must be 16kHz, 16-bit mono PCM WAV format. The project references a technical report available on arXiv (2603.10420) and mentions links for a paper, models, and a demo, though direct URLs are not provided in the README.

Highlighted Details

Achieves 97.57% F1 score and 99.60% AUC-ROC on the FLEURS-VAD-102 benchmark.
Supports detection of speech, singing, and music across over 100 languages.
Offers both non-streaming and streaming VAD, plus non-streaming AED capabilities.
Models are compact, with the VAD model being approximately 2.2 MB.
NCNN runtime support is available for multiplatform deployment.

Maintenance & Community

The project, developed by FireRedTeam, shows recent activity with releases in March 2026. No specific community channels (e.g., Discord, Slack) or detailed contributor information are provided in the README.

Licensing & Compatibility

The specific license for FireRedVAD is not explicitly stated in the provided text. While it references research papers, clarification is needed regarding commercial use and compatibility with closed-source projects.

Limitations & Caveats

The FLEURS-VAD-102 testset is noted as "coming soon" for public release. All audio inputs must be pre-processed to 16kHz, 16-bit mono PCM WAV format. The licensing terms for commercial applications require further investigation.

FireRedVAD by FireRedTeam

Explore Similar Projects

onnx-asr by istupakov

edgedict by theblackcat102

AIVoiceChat by KoljaB

VITA-Audio by VITA-MLLM

WhisperS2T by shashikg

realtime-transcription-fastrtc by sofdog-gh

athena by athena-team

Kimi-Audio by MoonshotAI

RealtimeSTT by KoljaB

sherpa-onnx by k2-fsa

FunASR by modelscope

faster-whisper by SYSTRAN