TTS framework for efficient, conversational speech generation (research paper)
Top 98.2% on sourcepulse
Pheme is an open-source framework for training efficient and conversational Text-to-Speech (TTS) models, designed for researchers and developers seeking high-quality speech synthesis with reduced data and computational requirements. It enables training Transformer-based models using significantly less data than comparable systems like VALL-E or SoundStorm, while supporting diverse data sources including conversational, podcast, and noisy audio.
How It Works
Pheme separates semantic and acoustic tokens, leveraging a specialized speech tokenizer for efficient representation. This architecture facilitates MaskGit-style parallel inference, achieving up to 15x speed-ups over autoregressive models. The framework emphasizes parameter, data, and inference efficiency, allowing for compact models and low-latency generation.
Quick Start & Requirements
conda create --name pheme3 python=3.10
, conda activate pheme3
), install PyTorch (pip3 install torch torchvision torchaudio
), and then install dependencies (pip3 install -r requirements.txt --no-deps
).python transformer_infer.py
.train_t2s.py
and train_s2a.py
are provided for training the Text-to-Speech and Acoustic-to-Speech components, respectively.parallel
for audio processing. GPU acceleration is implied for training and inference.Highlighted Details
Maintenance & Community
The project is associated with PolyAI and the authors of the "Pheme: Efficient and Conversational Speech Generation" paper. Links to demos and audio samples are provided.
Licensing & Compatibility
The repository does not explicitly state a license in the README. This requires further investigation for commercial use or integration into closed-source projects.
Limitations & Caveats
The README does not specify the license, which is a critical factor for adoption. While pre-trained models are available, the setup for training requires careful data preparation and environment configuration.
1 year ago
1 week