FireRedVAD  by FireRedTeam

Industrial-grade VAD and AED for diverse audio events

Created 1 month ago
347 stars

Top 80.0% on SourcePulse

GitHubView on GitHub
Project Summary

A state-of-the-art, industrial-grade solution for Voice Activity Detection (VAD) and Audio Event Detection (AED), FireRedVAD offers robust performance across 100+ languages. It targets engineers and researchers seeking to significantly outperform existing VAD tools, enabling advanced audio analysis applications with high accuracy and broad multilingual support.

How It Works

The system utilizes a DFSMN-based architecture for both non-streaming and streaming VAD, complemented by non-streaming AED capabilities. This deep learning approach excels at identifying speech, singing, and music segments with high precision. Its key advantages include state-of-the-art performance metrics and extensive multilingual support, positioning it as a versatile tool for diverse audio processing requirements.

Quick Start & Requirements

Installation is available via pip (pip install fireredvad or pip install fireredvad[gpu]) or from source. Prerequisites include Python 3.10+, PyTorch (GPU support optional), and ffmpeg for audio format conversion. Models can be downloaded from ModelScope or Hugging Face. Audio inputs must be 16kHz, 16-bit mono PCM WAV format. The project references a technical report available on arXiv (2603.10420) and mentions links for a paper, models, and a demo, though direct URLs are not provided in the README.

Highlighted Details

  • Achieves 97.57% F1 score and 99.60% AUC-ROC on the FLEURS-VAD-102 benchmark.
  • Supports detection of speech, singing, and music across over 100 languages.
  • Offers both non-streaming and streaming VAD, plus non-streaming AED capabilities.
  • Models are compact, with the VAD model being approximately 2.2 MB.
  • NCNN runtime support is available for multiplatform deployment.

Maintenance & Community

The project, developed by FireRedTeam, shows recent activity with releases in March 2026. No specific community channels (e.g., Discord, Slack) or detailed contributor information are provided in the README.

Licensing & Compatibility

The specific license for FireRedVAD is not explicitly stated in the provided text. While it references research papers, clarification is needed regarding commercial use and compatibility with closed-source projects.

Limitations & Caveats

The FLEURS-VAD-102 testset is noted as "coming soon" for public release. All audio inputs must be pre-processed to 16kHz, 16-bit mono PCM WAV format. The licensing terms for commercial applications require further investigation.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
4
Star History
71 stars in the last 30 days

Explore Similar Projects

Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

RealtimeSTT by KoljaB

0.8%
10k
Speech-to-text library for realtime applications
Created 2 years ago
Updated 4 weeks ago
Feedback? Help us improve.