wav2vec2-live  by oliverguhr

Live speech recognition demo using wav2vec 2.0

Created 4 years ago
367 stars

Top 76.8% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a Python library for real-time speech recognition using Hugging Face's wav2vec 2.0 models. It allows users to leverage various pre-trained wav2vec 2.0 models directly from their microphone input, enabling applications like live transcription or voice command interfaces.

How It Works

The library utilizes the wav2vec2 model architecture for speech-to-text conversion. It captures audio input from the system's default microphone, processes it in chunks, and feeds it to the specified wav2vec 2.0 model for inference. The approach allows for flexible use of any model available on the Hugging Face Hub, with automatic downloading on first use.

Quick Start & Requirements

  • Install via pip install -r requirements.txt after setting up a virtual environment.
  • Requires portaudio19-dev on Ubuntu for pyaudio.
  • Usage example: python live_asr.py
  • Official documentation and demo links are not provided in the README.

Highlighted Details

  • Supports any wav2vec 2.0 model from the Hugging Face model hub.
  • Processes audio directly from the system's default audio device.
  • Provides real-time inference time and sample length alongside transcribed text.

Maintenance & Community

  • No information on contributors, sponsorships, community channels, or roadmap is available in the README.

Licensing & Compatibility

  • The README does not specify a license.

Limitations & Caveats

The project relies on the system's default audio device, requiring manual configuration if it's not set correctly. The README mentions a potential "attempt to connect to server failed" message from pyaudio which can be safely ignored if the JACK audio server is not in use.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Travis Fischer Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

0.5%
9k
Speech-to-text library for realtime applications
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.