icefall by k2-fsa

Speech-related recipes for various datasets using k2-fsa and lhotse

Created 5 years ago

1,445 stars

Top 27.5% on SourcePulse

1 Expert Loves This Project

patrickvonplaten

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Project Summary

Icefall provides a comprehensive suite of recipes for training and deploying Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) models. It targets researchers and engineers working with speech technologies, offering state-of-the-art models and extensive dataset support, enabling rapid prototyping and benchmarking.

How It Works

Icefall leverages the k2-fsa and lhotse libraries for efficient speech model training. It supports various architectures like TDNN, LSTM, Conformer, and Zipformer, combined with CTC and Transducer loss functions. The project emphasizes flexible deployment through frameworks like Sherpa, Sherpa-NCNN, and Sherpa-ONNX, facilitating integration into diverse applications.

Quick Start & Requirements

Installation and detailed usage instructions are available in the official documentation.
Colab notebooks are provided for many recipes, allowing browser-based experimentation without local setup.

Highlighted Details

Extensive ASR dataset support, including LibriSpeech, Aishell, CommonVoice, and more.
Wide range of model architectures and loss functions (CTC, MMI, Transducer).
Achieves state-of-the-art results on benchmarks like LibriSpeech (e.g., 2.00% WER with Zipformer-large).
Supports deployment via TorchScript, ONNX, and NCNN for C++ environments.

Maintenance & Community

The project is actively maintained by the k2-fsa community.
Further details on contributing and community engagement can be found in the documentation.

Licensing & Compatibility

The project is released under a permissive license, facilitating commercial use and integration into closed-source projects.

Limitations & Caveats

While many recipes include Colab notebooks, full local setup may require significant computational resources (e.g., GPUs) and specific CUDA versions for optimal performance.

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

4

Issues (30d)

3

Star History

19 stars in the last 30 days

Explore Similar Projects

speech-recognition-uk by egorsmkv

Resource collection for Ukrainian speech AI

Created 6 years ago

Updated 10 months ago

awesome-russian-speech by alphacep

Curated list of Russian speech tech resources

Created 3 years ago

Updated 3 months ago

deepspeech-german by AASHISHAG

ASR module using Mozilla DeepSpeech for German speech

Created 7 years ago

Updated 3 years ago

Speech-to-Text-Russian by SergeyShk

Speech-to-Text tool for Russian using pykaldi

Created 6 years ago

Updated 1 year ago

Starred by

Maxime Labonne

Maxime Labonne(Head of Post-Training at Liquid AI).

dataspeech by huggingface

Suite of scripts for tagging speech datasets, especially for TTS model development

Created 2 years ago

Updated 1 year ago

zamia-speech by gooofy

Speech tools/data for cloudless ASR, plus TTS voice training

Created 9 years ago

Updated 5 years ago

athena by athena-team

Open-source speech processing engine for industrial/academic use

Created 6 years ago

Updated 3 years ago

TransformerTTS by spring-media

TensorFlow 2 implementation for non-autoregressive text-to-speech

Created 6 years ago

Updated 2 years ago

sherpa-onnx by k2-fsa

Speech toolkit for local, offline speech AI tasks via ONNX

Created 3 years ago

Updated 1 day ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Piotr Dąbkowski

Piotr Dąbkowski(Cofounder of ElevenLabs), and

2 more.

PaddleSpeech by PaddlePaddle

Speech toolkit for ASR, TTS, speaker verification, translation, and keyword spotting

Created 8 years ago

Updated 2 weeks ago

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral),

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs), and

3 more.

espnet by espnet

End-to-end speech processing toolkit for various speech tasks

Created 8 years ago

Updated 1 day ago

Starred by

Jason Huggins

Jason Huggins(Creator of Selenium),

Michael Han

Michael Han(Cofounder of Unsloth), and

11 more.

TTS by coqui-ai

Deep learning toolkit for Text-to-Speech, research-tested

Created 6 years ago

Updated 1 year ago

Feedback? Help us improve.