Open-source inference code for speech recognition and diarization models
Top 71.7% on sourcepulse
Reverb provides open-source inference code for Rev's state-of-the-art speech recognition (ASR) and speaker diarization models. It targets researchers and developers needing high-performance audio processing, offering competitive results on long-form speech tasks and simplifying integration through Python packages and Docker.
How It Works
The ASR component is built upon the WeNet framework, while diarization leverages the Pyannote framework. This modular approach allows for specialized, high-quality implementations for each task. The code is designed for efficient inference and offers flexibility in output formats and decoding strategies, enabling fine-tuning for specific use cases.
Quick Start & Requirements
pip install .
docker build -t reverb . --build-arg HUGGINGFACE_ACCESS_TOKEN=${YOUR_HUGGINGFACE_ACCESS_TOKEN}
.Highlighted Details
ctc_prefix_beam_search
).Maintenance & Community
The project lists several contributors from Rev. Further details on community or roadmap are not explicitly provided in the README.
Licensing & Compatibility
The license applies to the code; model licenses are separate and available on HuggingFace. Compatibility for commercial use or closed-source linking depends on the specific model licenses.
Limitations & Caveats
The README notes potential conflicts if another wenet
installation exists in the environment. The project is presented as inference code, with a separate repository (reverb-self-hosted
) suggested for large-scale, offline deployments.
3 months ago
Inactive