deepspeech-german by AASHISHAG

ASR module using Mozilla DeepSpeech for German speech

Created 6 years ago

319 stars

Top 85.0% on SourcePulse

Project Summary

This repository provides a comprehensive pipeline for training and deploying German Automatic Speech Recognition (ASR) models using Mozilla's DeepSpeech toolkit. It is targeted at researchers and developers needing a robust, end-to-end solution for German speech-to-text conversion, offering a detailed guide from data preparation to model training and optimization.

How It Works

The project leverages Mozilla DeepSpeech, an open-source ASR toolkit based on Baidu's research, utilizing TensorFlow for implementation. It focuses on end-to-end training, meaning the model learns to map acoustic features directly to character sequences without intermediate steps like phoneme alignment. This approach simplifies the pipeline and can lead to improved accuracy. The project also incorporates KenLM for language modeling, enhancing the ASR output by considering linguistic context.

Quick Start & Requirements

Installation: Requires cloning the repository and fetching submodules (git pull --recurse-submodules). A specific Python environment (3.7.9) is recommended using pyenv and virtualenv. Install dependencies via pip3 install -r python_requirements.txt.
Prerequisites: Linux environment (macOS/Windows possible with adjustments), TensorFlow 1.15 (for Python 3.7), pyenv, virtualenv, git, cmake, make, and wget. CUDA is recommended for training acceleration.
Data: Requires downloading and preparing several German speech corpora (TUDA-De, Mozilla Common Voice, Voxforge, M-AILABS, Spoken Wikipedia) and a German text corpus for language model training.
Links: Paper, DeepSpeech-API.

Highlighted Details

Detailed instructions for data acquisition and pre-processing for multiple German speech corpora.
Guidance on building a 3-gram language model using KenLM.
Scripts for training, fine-tuning, transfer learning (e.g., English to German), and hyper-parameter optimization.
Includes steps for generating TensorFlow Lite models for resource-constrained devices.
Provides pre-trained models for various DeepSpeech versions (v0.5.0 to v0.9.0).

Maintenance & Community

The project is based on Mozilla DeepSpeech v0.9.3, with a note to refer to Mozilla's repository for latest updates. No specific community channels (like Discord/Slack) or active maintenance signals are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license for this specific repository's code or scripts. It relies on Mozilla DeepSpeech, which is typically Apache 2.0 licensed. Compatibility for commercial use would depend on the licenses of all included components and datasets.

Limitations & Caveats

The project specifies TensorFlow 1.15, which is outdated and only supports Python up to 3.7. Some dependencies like audiomate might require patched GitHub versions. The training process is resource-intensive, and while CUDA is recommended, setup can be complex. The provided Word Error Rate (WER) results are from 2019 and may not reflect current state-of-the-art performance.

deepspeech-german by AASHISHAG

Explore Similar Projects

speech-recognition-uk by egorsmkv

Meta-voicebox by SpeechifyInc

FastDiff by Rongjiehuang

dataspeech by huggingface

hibiki by kyutai-labs

kospeech by sooftware

large_concept_model by facebookresearch

icefall by k2-fsa

FunASR by modelscope

PaddleSpeech by PaddlePaddle

unilm by microsoft

DeepSpeech by mozilla