NATSpeech  by NATSpeech

PyTorch framework for non-autoregressive text-to-speech (NAR-TTS)

created 3 years ago
997 stars

Top 38.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch framework for Non-Autoregressive Text-to-Speech (NAR-TTS), featuring official implementations of PortaSpeech and DiffSpeech. It targets researchers and developers in speech synthesis, offering a scalable platform for training and inference with a focus on high-quality, efficient speech generation.

How It Works

The framework leverages non-autoregressive models, which generate speech features in parallel rather than sequentially. This approach significantly speeds up inference compared to autoregressive models. It utilizes Montreal Forced Aligner (MFA) for data processing and includes a custom random-access dataset implementation for efficient handling of large speech datasets.

Quick Start & Requirements

  • Install: Clone the repository, set PYTHONPATH, create and activate a virtual environment, install dependencies via pip install -r requirements.txt, and install MFA using bash mfa_usr/install_mfa.sh.
  • Prerequisites: Python 3.6+, PyTorch 1.9.0+, NumPy 1.19.1, Cython, sox, and libsox-fmt-mp3. Tested on Linux/Ubuntu 18.04.
  • Resources: Pretrained models are available for download.

Highlighted Details

  • Official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022).
  • Includes data processing tools using Montreal Forced Aligner.
  • Scalable framework for training and inference.
  • Hugging Face integration available.

Maintenance & Community

The project is associated with the NATSpeech research group. Links to Hugging Face demos and Chinese documentation are provided.

Licensing & Compatibility

The project is released under a custom license. It explicitly prohibits using the technology to generate speech of individuals without their consent, citing potential copyright violations.

Limitations & Caveats

The license contains significant restrictions on usage, particularly concerning the generation of speech for specific individuals, which may impact commercial or broader research applications. The installation requires specific versions of PyTorch and NumPy, and includes system-level dependencies like sox.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.