athena by athena-team

Open-source speech processing engine for industrial/academic use

Created 6 years ago

965 stars

Top 38.2% on SourcePulse

Project Summary

Athena is an open-source, TensorFlow-based speech processing engine designed for both industrial applications and academic research. It offers a comprehensive suite of end-to-end models for Automatic Speech Recognition (ASR), Text-to-Speech (TTS), Voice Activity Detection (VAD), and Keyword Spotting (KWS), aiming to make advanced speech technologies accessible.

How It Works

Athena implements hybrid Attention/CTC and streaming methods for ASR, and FastSpeech/FastSpeech2/Transformer architectures for TTS. It features a Kaldi-free, Pythonic feature extractor (Athena_transform) for ease of use. The engine supports unsupervised pre-training (MPC), multi-GPU training via Horovod, and C++ deployment for local servers, offering flexibility in model development and deployment.

Quick Start & Requirements

Installation: Requires TensorFlow 2.3 or 2.8. Installation involves pip install tensorflow-gpu==<version>, pip install -r requirements.txt, and building from source.
- TF 2.3: pip install tensorflow-gpu==2.3.0
- TF 2.8: pip install tensorflow-gpu==2.8.0
- Build: python setup.py bdist_wheel sdist && python -m pip install --ignore-installed dist/athena-2.0*.whl
Prerequisites: TensorFlow (GPU recommended), Python.
Resources: Pre-trained models are available at Athena-model-zoo.
Demos: See Run demo section for ASR, TTS, and VAD demos.

Highlighted Details

Supports Transformer, Conformer, and AV-Transformer/Conformer architectures for ASR.
Includes FastSpeech, FastSpeech2, Tacotron2, and Transformer TTS models.
Offers C++ runtime for deployment and WFST-based decoding.
Provides performance benchmarks for ASR, TTS, VAD, and KWS tasks on various datasets.

Maintenance & Community

The project appears to have active development with recent updates in 2022. Communication channels are primarily through a WeChat group.

Licensing & Compatibility

The README does not explicitly state a license. TensorFlow is a dependency.

Limitations & Caveats

The C++ deployment is currently limited to ASR tasks. Installation instructions specify different setup commands for TensorFlow 2.3 and 2.8, suggesting potential compatibility nuances.

athena by athena-team

Explore Similar Projects

edgedict by theblackcat102

dataspeech by huggingface

speech_course by yandexdataschool

asv-subtools by Snowdar

kospeech by sooftware

TransformerTTS by spring-media

icefall by k2-fsa

speech-to-text-wavenet by buriburisuri

FunASR by modelscope

speechbrain by speechbrain

PaddleSpeech by PaddlePaddle

espnet by espnet