speechbox by huggingface

Speech processing tools (punctuation, diarization)

Created 3 years ago

357 stars

Top 78.7% on SourcePulse

View on GitHub

5 Experts Love This Project

Clement Delangue

Cofounder of Hugging Face

Philipp Schmid

DevRel at Google DeepMind

Omar Sanseviero

DevRel at Google DeepMind

Lewis Tunstall

Research Engineer at Hugging Face

and 1 more!

Project Summary

🤗 Speechbox provides tools for speech processing tasks, primarily punctuation restoration and ASR with speaker diarization. It targets developers and researchers working with audio data, offering a streamlined way to enhance transcriptions.

How It Works

Punctuation restoration leverages Whisper models by forcing them to predict specific words while allowing modifications to capitalization, spacing, and punctuation. This approach capitalizes on Whisper's strong language understanding to infer correct punctuation. The ASR with Speaker Diarization pipeline combines a speech recognition system (like Whisper) with a speaker diarization model to attribute speech segments to specific speakers.

Quick Start & Requirements

Install via pip: pip install speechbox
Additional dependencies for examples: pip install transformers datasets pyannote.audio
Requires PyTorch and Hugging Face Transformers.
GPU with CUDA is recommended for performance.
Official quick-start examples are available in the repository.

Highlighted Details

Supports punctuation restoration using various OpenAI Whisper models (tiny.en to medium.en).
Offers an ASR + Diarization pipeline for speaker-attributed transcriptions of long audio.
Includes web demos (Spaces) for trying out the punctuation restoration and ASR+Diarization functionalities.

Maintenance & Community

🚨 The package is not actively maintained. 🚨 The maintainers are seeking contributors.
Community interaction is encouraged via a Discord channel for ML for Audio and Speech.

Licensing & Compatibility

The README does not explicitly state a license. However, as a Hugging Face project, it is likely Apache 2.0 or MIT, but this requires verification.
Compatibility for commercial use is not specified.

Limitations & Caveats

The package is explicitly stated as not actively maintained, indicating potential for unaddressed bugs or lack of updates.
Punctuation restoration has only been tested on a limited set of Whisper models and a small audio dataset.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days