huggingsound by jonatasgrosman

Speech toolkit for speech-related tasks based on Hugging Face's tools

Created 3 years ago

467 stars

Top 65.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Omar Sanseviero

DevRel at Google DeepMind

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Project Summary

HuggingSound provides a Python toolkit for speech-related tasks, primarily focusing on speech recognition using Hugging Face's ecosystem. It's designed for researchers and developers needing an accessible interface for experiments with pre-trained models, fine-tuning, and evaluation.

How It Works

The toolkit leverages Hugging Face's transformers library to load and utilize various CTC (Connectionist Temporal Classification) models for speech recognition. It supports direct inference with pre-trained models, enhanced decoding via external language models (like KenLM), and provides functionalities for model fine-tuning on custom datasets. The output includes transcriptions, character-level timestamps, and confidence probabilities.

Quick Start & Requirements

Install via pip: pip install huggingsound
Requires Python 3.8+.
For MP3 file support, ffmpeg is needed (sudo apt-get install ffmpeg).
Official documentation is sparse; refer to the repository's examples folder.

Highlighted Details

Supports LM-boosted decoding with Kensho, Parlance, and Flashlight decoders.
Outputs character-level timestamps and probabilities alongside transcriptions.
Enables fine-tuning of speech recognition models on custom audio-transcription pairs.
Includes an evaluation module for calculating Word Error Rate (WER) and Character Error Rate (CER).

Maintenance & Community

The project is maintained by Jonatas Grosman. Contributions are welcomed, including documentation improvements. A citation format is provided.

Licensing & Compatibility

The project does not explicitly state a license in the README.

Limitations & Caveats

The documentation is noted as incomplete, with users encouraged to consult the source code or open issues for guidance. The project's scope is intentionally kept simple, implying advanced features or extensive support may be limited.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days