athena  by athena-team

Open-source speech processing engine for industrial/academic use

Created 5 years ago
959 stars

Top 38.4% on SourcePulse

GitHubView on GitHub
Project Summary

Athena is an open-source, TensorFlow-based speech processing engine designed for both industrial applications and academic research. It offers a comprehensive suite of end-to-end models for Automatic Speech Recognition (ASR), Text-to-Speech (TTS), Voice Activity Detection (VAD), and Keyword Spotting (KWS), aiming to make advanced speech technologies accessible.

How It Works

Athena implements hybrid Attention/CTC and streaming methods for ASR, and FastSpeech/FastSpeech2/Transformer architectures for TTS. It features a Kaldi-free, Pythonic feature extractor (Athena_transform) for ease of use. The engine supports unsupervised pre-training (MPC), multi-GPU training via Horovod, and C++ deployment for local servers, offering flexibility in model development and deployment.

Quick Start & Requirements

  • Installation: Requires TensorFlow 2.3 or 2.8. Installation involves pip install tensorflow-gpu==<version>, pip install -r requirements.txt, and building from source.
    • TF 2.3: pip install tensorflow-gpu==2.3.0
    • TF 2.8: pip install tensorflow-gpu==2.8.0
    • Build: python setup.py bdist_wheel sdist && python -m pip install --ignore-installed dist/athena-2.0*.whl
  • Prerequisites: TensorFlow (GPU recommended), Python.
  • Resources: Pre-trained models are available at Athena-model-zoo.
  • Demos: See Run demo section for ASR, TTS, and VAD demos.

Highlighted Details

  • Supports Transformer, Conformer, and AV-Transformer/Conformer architectures for ASR.
  • Includes FastSpeech, FastSpeech2, Tacotron2, and Transformer TTS models.
  • Offers C++ runtime for deployment and WFST-based decoding.
  • Provides performance benchmarks for ASR, TTS, VAD, and KWS tasks on various datasets.

Maintenance & Community

The project appears to have active development with recent updates in 2022. Communication channels are primarily through a WeChat group.

Licensing & Compatibility

The README does not explicitly state a license. TensorFlow is a dependency.

Limitations & Caveats

The C++ deployment is currently limited to ASR tasks. Installation instructions specify different setup commands for TensorFlow 2.3 and 2.8, suggesting potential compatibility nuances.

Health Check
Last Commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Benjamin Bolte Benjamin Bolte(Cofounder of K-Scale Labs), and
3 more.

espnet by espnet

0.2%
9k
End-to-end speech processing toolkit for various speech tasks
Created 7 years ago
Updated 3 days ago
Feedback? Help us improve.