athena  by athena-team

Open-source speech processing engine for industrial/academic use

created 5 years ago
953 stars

Top 39.4% on sourcepulse

GitHubView on GitHub
Project Summary

Athena is an open-source, TensorFlow-based speech processing engine designed for both industrial applications and academic research. It offers a comprehensive suite of end-to-end models for Automatic Speech Recognition (ASR), Text-to-Speech (TTS), Voice Activity Detection (VAD), and Keyword Spotting (KWS), aiming to make advanced speech technologies accessible.

How It Works

Athena implements hybrid Attention/CTC and streaming methods for ASR, and FastSpeech/FastSpeech2/Transformer architectures for TTS. It features a Kaldi-free, Pythonic feature extractor (Athena_transform) for ease of use. The engine supports unsupervised pre-training (MPC), multi-GPU training via Horovod, and C++ deployment for local servers, offering flexibility in model development and deployment.

Quick Start & Requirements

  • Installation: Requires TensorFlow 2.3 or 2.8. Installation involves pip install tensorflow-gpu==<version>, pip install -r requirements.txt, and building from source.
    • TF 2.3: pip install tensorflow-gpu==2.3.0
    • TF 2.8: pip install tensorflow-gpu==2.8.0
    • Build: python setup.py bdist_wheel sdist && python -m pip install --ignore-installed dist/athena-2.0*.whl
  • Prerequisites: TensorFlow (GPU recommended), Python.
  • Resources: Pre-trained models are available at Athena-model-zoo.
  • Demos: See Run demo section for ASR, TTS, and VAD demos.

Highlighted Details

  • Supports Transformer, Conformer, and AV-Transformer/Conformer architectures for ASR.
  • Includes FastSpeech, FastSpeech2, Tacotron2, and Transformer TTS models.
  • Offers C++ runtime for deployment and WFST-based decoding.
  • Provides performance benchmarks for ASR, TTS, VAD, and KWS tasks on various datasets.

Maintenance & Community

The project appears to have active development with recent updates in 2022. Communication channels are primarily through a WeChat group.

Licensing & Compatibility

The README does not explicitly state a license. TensorFlow is a dependency.

Limitations & Caveats

The C++ deployment is currently limited to ASR tasks. Installation instructions specify different setup commands for TensorFlow 2.3 and 2.8, suggesting potential compatibility nuances.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
6 more.

FasterTransformer by NVIDIA

0.1%
6k
Optimized transformer library for inference
created 4 years ago
updated 1 year ago
Feedback? Help us improve.