wav2vec2-live  by oliverguhr

Live speech recognition demo using wav2vec 2.0

created 4 years ago
362 stars

Top 78.7% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a Python library for real-time speech recognition using Hugging Face's wav2vec 2.0 models. It allows users to leverage various pre-trained wav2vec 2.0 models directly from their microphone input, enabling applications like live transcription or voice command interfaces.

How It Works

The library utilizes the wav2vec2 model architecture for speech-to-text conversion. It captures audio input from the system's default microphone, processes it in chunks, and feeds it to the specified wav2vec 2.0 model for inference. The approach allows for flexible use of any model available on the Hugging Face Hub, with automatic downloading on first use.

Quick Start & Requirements

  • Install via pip install -r requirements.txt after setting up a virtual environment.
  • Requires portaudio19-dev on Ubuntu for pyaudio.
  • Usage example: python live_asr.py
  • Official documentation and demo links are not provided in the README.

Highlighted Details

  • Supports any wav2vec 2.0 model from the Hugging Face model hub.
  • Processes audio directly from the system's default audio device.
  • Provides real-time inference time and sample length alongside transcribed text.

Maintenance & Community

  • No information on contributors, sponsorships, community channels, or roadmap is available in the README.

Licensing & Compatibility

  • The README does not specify a license.

Limitations & Caveats

The project relies on the system's default audio device, requiring manual configuration if it's not set correctly. The README mentions a potential "attempt to connect to server failed" message from pyaudio which can be safely ignored if the JACK audio server is not in use.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.