Open-source speech processing engine for industrial/academic use
Top 39.4% on sourcepulse
Athena is an open-source, TensorFlow-based speech processing engine designed for both industrial applications and academic research. It offers a comprehensive suite of end-to-end models for Automatic Speech Recognition (ASR), Text-to-Speech (TTS), Voice Activity Detection (VAD), and Keyword Spotting (KWS), aiming to make advanced speech technologies accessible.
How It Works
Athena implements hybrid Attention/CTC and streaming methods for ASR, and FastSpeech/FastSpeech2/Transformer architectures for TTS. It features a Kaldi-free, Pythonic feature extractor (Athena_transform
) for ease of use. The engine supports unsupervised pre-training (MPC), multi-GPU training via Horovod, and C++ deployment for local servers, offering flexibility in model development and deployment.
Quick Start & Requirements
pip install tensorflow-gpu==<version>
, pip install -r requirements.txt
, and building from source.
pip install tensorflow-gpu==2.3.0
pip install tensorflow-gpu==2.8.0
python setup.py bdist_wheel sdist && python -m pip install --ignore-installed dist/athena-2.0*.whl
Highlighted Details
Maintenance & Community
The project appears to have active development with recent updates in 2022. Communication channels are primarily through a WeChat group.
Licensing & Compatibility
The README does not explicitly state a license. TensorFlow is a dependency.
Limitations & Caveats
The C++ deployment is currently limited to ASR tasks. Installation instructions specify different setup commands for TensorFlow 2.3 and 2.8, suggesting potential compatibility nuances.
2 years ago
Inactive