Live speech recognition demo using wav2vec 2.0
Top 78.7% on sourcepulse
This project provides a Python library for real-time speech recognition using Hugging Face's wav2vec 2.0 models. It allows users to leverage various pre-trained wav2vec 2.0 models directly from their microphone input, enabling applications like live transcription or voice command interfaces.
How It Works
The library utilizes the wav2vec2
model architecture for speech-to-text conversion. It captures audio input from the system's default microphone, processes it in chunks, and feeds it to the specified wav2vec 2.0 model for inference. The approach allows for flexible use of any model available on the Hugging Face Hub, with automatic downloading on first use.
Quick Start & Requirements
pip install -r requirements.txt
after setting up a virtual environment.portaudio19-dev
on Ubuntu for pyaudio
.python live_asr.py
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project relies on the system's default audio device, requiring manual configuration if it's not set correctly. The README mentions a potential "attempt to connect to server failed" message from pyaudio
which can be safely ignored if the JACK audio server is not in use.
1 year ago
Inactive