reverb by revdotcom

Open-source inference code for speech recognition and diarization models

Created 1 year ago

435 stars

Top 68.4% on SourcePulse

Project Summary

Reverb provides open-source inference code for Rev's state-of-the-art speech recognition (ASR) and speaker diarization models. It targets researchers and developers needing high-performance audio processing, offering competitive results on long-form speech tasks and simplifying integration through Python packages and Docker.

How It Works

The ASR component is built upon the WeNet framework, while diarization leverages the Pyannote framework. This modular approach allows for specialized, high-quality implementations for each task. The code is designed for efficient inference and offers flexibility in output formats and decoding strategies, enabling fine-tuning for specific use cases.

Quick Start & Requirements

Install via pip: pip install .
Requires Python 3.10+.
HuggingFace access token and Git LFS are necessary for model downloads.
Docker image can be built using docker build -t reverb . --build-arg HUGGINGFACE_ACCESS_TOKEN=${YOUR_HUGGINGFACE_ACCESS_TOKEN}.
Official documentation and model details are available on HuggingFace.

Highlighted Details

Achieves lower Word Error Rate (WER) than Whisper Large-v3 on the Earnings21, Earnings22, and Rev16 benchmarks.
Reverb Diarization V2 shows improved Word Error Rate (WDER) compared to Pyannote 3.0 when combined with Rev's ASR.
Supports verbatimicity control and multiple decoding modes (e.g., ctc_prefix_beam_search).
Offers both command-line and Python API for inference.

Maintenance & Community

The project lists several contributors from Rev. Further details on community or roadmap are not explicitly provided in the README.

Licensing & Compatibility

The license applies to the code; model licenses are separate and available on HuggingFace. Compatibility for commercial use or closed-source linking depends on the specific model licenses.

Limitations & Caveats

The README notes potential conflicts if another wenet installation exists in the environment. The project is presented as inference code, with a separate repository (reverb-self-hosted) suggested for large-scale, offline deployments.

reverb by revdotcom

Explore Similar Projects

speech-recognition-uk by egorsmkv

ocotillo by neonbjb

ctc-segmentation by lumaku

echogarden by echogarden-project

Whisper-transcription_and_diarization-speaker-identification- by lablab-ai

AudioToText by Carleslc

Speech-to-Text-Russian by SergeyShk

aTrain by JuergenFleiss

zamia-speech by gooofy

parrots by shibing624

whisper-plus by kadirnar

whisperX by m-bain