huggingsound  by jonatasgrosman

Speech toolkit for speech-related tasks based on Hugging Face's tools

created 3 years ago
458 stars

Top 67.0% on sourcepulse

GitHubView on GitHub
Project Summary

HuggingSound provides a Python toolkit for speech-related tasks, primarily focusing on speech recognition using Hugging Face's ecosystem. It's designed for researchers and developers needing an accessible interface for experiments with pre-trained models, fine-tuning, and evaluation.

How It Works

The toolkit leverages Hugging Face's transformers library to load and utilize various CTC (Connectionist Temporal Classification) models for speech recognition. It supports direct inference with pre-trained models, enhanced decoding via external language models (like KenLM), and provides functionalities for model fine-tuning on custom datasets. The output includes transcriptions, character-level timestamps, and confidence probabilities.

Quick Start & Requirements

  • Install via pip: pip install huggingsound
  • Requires Python 3.8+.
  • For MP3 file support, ffmpeg is needed (sudo apt-get install ffmpeg).
  • Official documentation is sparse; refer to the repository's examples folder.

Highlighted Details

  • Supports LM-boosted decoding with Kensho, Parlance, and Flashlight decoders.
  • Outputs character-level timestamps and probabilities alongside transcriptions.
  • Enables fine-tuning of speech recognition models on custom audio-transcription pairs.
  • Includes an evaluation module for calculating Word Error Rate (WER) and Character Error Rate (CER).

Maintenance & Community

The project is maintained by Jonatas Grosman. Contributions are welcomed, including documentation improvements. A citation format is provided.

Licensing & Compatibility

The project does not explicitly state a license in the README.

Limitations & Caveats

The documentation is noted as incomplete, with users encouraged to consult the source code or open issues for guidance. The project's scope is intentionally kept simple, implying advanced features or extensive support may be limited.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.