Discover and explore top open-source AI tools and projects—updated daily.
oliverguhrLive speech recognition demo using wav2vec 2.0
Top 75.9% on SourcePulse
This project provides a Python library for real-time speech recognition using Hugging Face's wav2vec 2.0 models. It allows users to leverage various pre-trained wav2vec 2.0 models directly from their microphone input, enabling applications like live transcription or voice command interfaces.
How It Works
The library utilizes the wav2vec2 model architecture for speech-to-text conversion. It captures audio input from the system's default microphone, processes it in chunks, and feeds it to the specified wav2vec 2.0 model for inference. The approach allows for flexible use of any model available on the Hugging Face Hub, with automatic downloading on first use.
Quick Start & Requirements
pip install -r requirements.txt after setting up a virtual environment.portaudio19-dev on Ubuntu for pyaudio.python live_asr.pyHighlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project relies on the system's default audio device, requiring manual configuration if it's not set correctly. The README mentions a potential "attempt to connect to server failed" message from pyaudio which can be safely ignored if the JACK audio server is not in use.
1 year ago
Inactive
KoljaB