speechbox  by huggingface

Speech processing tools (punctuation, diarization)

created 2 years ago
359 stars

Top 79.1% on sourcepulse

GitHubView on GitHub
Project Summary

🤗 Speechbox provides tools for speech processing tasks, primarily punctuation restoration and ASR with speaker diarization. It targets developers and researchers working with audio data, offering a streamlined way to enhance transcriptions.

How It Works

Punctuation restoration leverages Whisper models by forcing them to predict specific words while allowing modifications to capitalization, spacing, and punctuation. This approach capitalizes on Whisper's strong language understanding to infer correct punctuation. The ASR with Speaker Diarization pipeline combines a speech recognition system (like Whisper) with a speaker diarization model to attribute speech segments to specific speakers.

Quick Start & Requirements

  • Install via pip: pip install speechbox
  • Additional dependencies for examples: pip install transformers datasets pyannote.audio
  • Requires PyTorch and Hugging Face Transformers.
  • GPU with CUDA is recommended for performance.
  • Official quick-start examples are available in the repository.

Highlighted Details

  • Supports punctuation restoration using various OpenAI Whisper models (tiny.en to medium.en).
  • Offers an ASR + Diarization pipeline for speaker-attributed transcriptions of long audio.
  • Includes web demos (Spaces) for trying out the punctuation restoration and ASR+Diarization functionalities.

Maintenance & Community

  • 🚨 The package is not actively maintained. 🚨 The maintainers are seeking contributors.
  • Community interaction is encouraged via a Discord channel for ML for Audio and Speech.

Licensing & Compatibility

  • The README does not explicitly state a license. However, as a Hugging Face project, it is likely Apache 2.0 or MIT, but this requires verification.
  • Compatibility for commercial use is not specified.

Limitations & Caveats

  • The package is explicitly stated as not actively maintained, indicating potential for unaddressed bugs or lack of updates.
  • Punctuation restoration has only been tested on a limited set of Whisper models and a small audio dataset.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.