parler-tts by huggingface

TTS library for high-quality speech generation, based on a research paper

Created 2 years ago

5,534 stars

Top 9.0% on SourcePulse

View on GitHub

3 Experts Love This Project

Tim J. Baek

Founder of Open WebUI

Gabriel Almeida

Cofounder of Langflow

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

Parler-TTS is a lightweight, open-source library for generating high-quality, natural-sounding speech with controllable speaker characteristics and speaking style. It is designed for researchers and developers looking to build or integrate advanced text-to-speech capabilities into their applications, offering both inference and training functionalities.

How It Works

Parler-TTS leverages a text-to-speech model architecture that allows for natural language guidance of speech generation. Users can control aspects like gender, pitch, and speaking style through descriptive text prompts. The library supports specific speaker selection by name, enabling consistent voice generation. Recent updates include compatibility with SDPA and Flash Attention 2, along with model compilation for optimized inference speed.

Quick Start & Requirements

Install via pip: pip install git+https://github.com/huggingface/parler-tts.git
Apple Silicon users may need: pip3 install --pre torch torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
Requires PyTorch, Transformers, and soundfile. GPU recommended for optimal performance.
Demo available at: https://huggingface.co/spaces/parler-tts/parler-tts

Highlighted Details

Offers two new checkpoints: Parler-TTS Mini (880M parameters) and Parler-TTS Large (2.3B parameters), trained on 45k hours of audiobook data.
Supports fine-tuning on custom datasets and provides a training guide.
Inference can be optimized using SDPA, torch.compile, and streaming.
Allows control over audio quality (clear vs. noisy) and prosody via punctuation in prompts.

Maintenance & Community

The project is actively developed by Hugging Face, with contributions welcomed for improving datasets, training methods (e.g., PEFT, multilingual training), optimization, and evaluation.

Licensing & Compatibility

Released under a permissive license, enabling community development and commercial use.

Limitations & Caveats

While optimized for speed, performance may vary based on hardware. The README indicates ongoing work to add features like PEFT compatibility and explore multilingual training.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

21 stars in the last 30 days