TTS-papers  by coqui-ai

Collection of TTS research papers

Created 5 years ago
724 stars

Top 47.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a curated collection of research papers and summaries related to Text-to-Speech (TTS) synthesis. It aims to provide engineers and researchers with a centralized resource for understanding the evolution and various approaches in TTS technology, from foundational models to recent advancements.

How It Works

The repository organizes papers by key TTS concepts such as phoneme/character representations, transfer learning, attention mechanisms, non-autoregressive models, multi-speaker synthesis, and vocoders. Each entry typically includes a link to the paper, a brief summary of its core methodology, and sometimes personal insights or experimental observations.

Highlighted Details

  • Covers a wide range of TTS architectures including Tacotron, FastSpeech, Glow-TTS, and GAN-based approaches.
  • Details various techniques for alignment, duration prediction, and speaker adaptation.
  • Includes summaries of numerous vocoder models like WaveNet, MelGAN, and WaveGlow.
  • Features papers on multi-lingual and few-shot TTS adaptation.

Maintenance & Community

This repository appears to be a static collection of links and summaries, with no active development or community interaction explicitly mentioned.

Licensing & Compatibility

The repository itself does not contain code and is a collection of links to external research papers. The licensing of the linked papers would be governed by their respective publishers.

Limitations & Caveats

This repository is a curated list of papers and does not provide runnable code or implementations. The summaries are subjective and may not cover all nuances of the original research. Some entries include personal opinions or "2 cents" which should be considered as such.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.4%
55k
Few-shot voice cloning and TTS web UI
Created 2 years ago
Updated 2 weeks ago
Feedback? Help us improve.