parler-tts  by huggingface

TTS library for high-quality speech generation, based on a research paper

created 1 year ago
5,373 stars

Top 9.5% on sourcepulse

GitHubView on GitHub
Project Summary

Parler-TTS is a lightweight, open-source library for generating high-quality, natural-sounding speech with controllable speaker characteristics and speaking style. It is designed for researchers and developers looking to build or integrate advanced text-to-speech capabilities into their applications, offering both inference and training functionalities.

How It Works

Parler-TTS leverages a text-to-speech model architecture that allows for natural language guidance of speech generation. Users can control aspects like gender, pitch, and speaking style through descriptive text prompts. The library supports specific speaker selection by name, enabling consistent voice generation. Recent updates include compatibility with SDPA and Flash Attention 2, along with model compilation for optimized inference speed.

Quick Start & Requirements

  • Install via pip: pip install git+https://github.com/huggingface/parler-tts.git
  • Apple Silicon users may need: pip3 install --pre torch torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
  • Requires PyTorch, Transformers, and soundfile. GPU recommended for optimal performance.
  • Demo available at: https://huggingface.co/spaces/parler-tts/parler-tts

Highlighted Details

  • Offers two new checkpoints: Parler-TTS Mini (880M parameters) and Parler-TTS Large (2.3B parameters), trained on 45k hours of audiobook data.
  • Supports fine-tuning on custom datasets and provides a training guide.
  • Inference can be optimized using SDPA, torch.compile, and streaming.
  • Allows control over audio quality (clear vs. noisy) and prosody via punctuation in prompts.

Maintenance & Community

The project is actively developed by Hugging Face, with contributions welcomed for improving datasets, training methods (e.g., PEFT, multilingual training), optimization, and evaluation.

Licensing & Compatibility

Released under a permissive license, enabling community development and commercial use.

Limitations & Caveats

While optimized for speed, performance may vary based on hardware. The README indicates ongoing work to add features like PEFT compatibility and explore multilingual training.

Health Check
Last commit

7 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
2
Star History
160 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.