Speech tools/data for cloudless ASR, plus TTS voice training
Top 68.4% on sourcepulse
This repository provides open-source tools and data for building cloudless Automatic Speech Recognition (ASR) systems. It targets developers and researchers in Natural Language Processing (NLP) who need to create custom speech models, offering scripts for data processing, model training, and integration with Kaldi and wav2letter++.
How It Works
The project leverages various open-source speech and text corpora (e.g., VoxForge, LibriSpeech, Common Voice, Europarl) to train ASR models. It includes Python scripts for data cleaning, format conversion, noise augmentation, and language model generation using KenLM. The core ASR models supported are Kaldi's nnet3 chain and wav2letter++, with capabilities for G2P conversion using Sequitur and model adaptation.
Quick Start & Requirements
~/.speechrc
. Estimated setup time can be substantial due to data acquisition and model training.kaldi_decode_wav.py
, kaldi_decode_live.py
) and a Docker image.Highlighted Details
Maintenance & Community
The project is maintained by Guenter Bartsch and Marc Puels, with contributions from Paul Guyot. There are no explicit links to community forums (Discord/Slack) or a public roadmap in the README.
Licensing & Compatibility
The project's own scripts and data are LGPLv3 licensed. It notes that some scripts and files are based on original works, and users should check copyright headers for specific licensing details. This license generally permits commercial use and linking with closed-source applications.
Limitations & Caveats
The README explicitly states that the scripts do not form a complete end-user application and are primarily for developers. Setup requires considerable effort in data collection and configuration. Some scripts are noted as experimental (e.g., Zamia-TTS). The project appears to have had its last update around 2018, suggesting potential for outdated dependencies or lack of active maintenance.
4 years ago
1 week