parler-tts  by huggingface

TTS library for high-quality speech generation, based on a research paper

Created 1 year ago
5,416 stars

Top 9.4% on SourcePulse

GitHubView on GitHub
Project Summary

Parler-TTS is a lightweight, open-source library for generating high-quality, natural-sounding speech with controllable speaker characteristics and speaking style. It is designed for researchers and developers looking to build or integrate advanced text-to-speech capabilities into their applications, offering both inference and training functionalities.

How It Works

Parler-TTS leverages a text-to-speech model architecture that allows for natural language guidance of speech generation. Users can control aspects like gender, pitch, and speaking style through descriptive text prompts. The library supports specific speaker selection by name, enabling consistent voice generation. Recent updates include compatibility with SDPA and Flash Attention 2, along with model compilation for optimized inference speed.

Quick Start & Requirements

  • Install via pip: pip install git+https://github.com/huggingface/parler-tts.git
  • Apple Silicon users may need: pip3 install --pre torch torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
  • Requires PyTorch, Transformers, and soundfile. GPU recommended for optimal performance.
  • Demo available at: https://huggingface.co/spaces/parler-tts/parler-tts

Highlighted Details

  • Offers two new checkpoints: Parler-TTS Mini (880M parameters) and Parler-TTS Large (2.3B parameters), trained on 45k hours of audiobook data.
  • Supports fine-tuning on custom datasets and provides a training guide.
  • Inference can be optimized using SDPA, torch.compile, and streaming.
  • Allows control over audio quality (clear vs. noisy) and prosody via punctuation in prompts.

Maintenance & Community

The project is actively developed by Hugging Face, with contributions welcomed for improving datasets, training methods (e.g., PEFT, multilingual training), optimization, and evaluation.

Licensing & Compatibility

Released under a permissive license, enabling community development and commercial use.

Limitations & Caveats

While optimized for speed, performance may vary based on hardware. The README indicates ongoing work to add features like PEFT compatibility and explore multilingual training.

Health Check
Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
40 stars in the last 30 days

Explore Similar Projects

Starred by Tim J. Baek Tim J. Baek(Founder of Open WebUI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
4 more.

StyleTTS2 by yl4579

0.2%
6k
Text-to-speech model achieving human-level synthesis
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
2 more.

metavoice-src by metavoiceio

0.1%
4k
TTS model for human-like, expressive speech
Created 1 year ago
Updated 1 year ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
2 more.

ChatTTS by 2noise

0.2%
38k
Generative speech model for daily dialogue
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.