voicebox-pytorch by lucidrains

Pytorch implementation of MetaAI's Voicebox text-to-speech model

Created 2 years ago

668 stars

Top 50.6% on SourcePulse

Project Summary

This repository provides a PyTorch implementation of MetaAI's Voicebox, a state-of-the-art text-to-speech (TTS) model. It aims to offer a flexible and efficient framework for researchers and developers working on advanced speech synthesis, particularly those interested in multilingual and universal speech generation.

How It Works

The implementation leverages a conditional flow matching approach, integrating components like HubertWithKmeans for semantic tokenization and EncodecVoco for audio encoding/decoding. It supports both text-conditioned and unconditional generation, utilizing adaptive normalization for time conditioning and offering flexibility in ODE solver choices (torchdiffeq, torchode).

Quick Start & Requirements

Install: pip install voicebox-pytorch
Prerequisites: Requires pre-trained checkpoints for HubertWithKmeans (e.g., from fairseq) and potentially a trained TextToSemantic model (e.g., Spear-TTS).
Usage examples for training and sampling are provided in the README.

Highlighted Details

Implements Voicebox, a SOTA TTS network from MetaAI.
Utilizes rotary embeddings and adaptive normalization.
Integrates with Spear-TTS for text-to-semantic conditioning.
Supports both torchdiffeq and torchode for ODE solving.

Maintenance & Community

The project has received sponsorship from StabilityAI and an Imminent Grant. Notable contributors include Bryan Chiang and Lucas Newman. Community links are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. However, given the nature of open-source implementations of research papers, users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The author recommends using E2 TTS instead of this implementation. Some aspects, like correctly handling MelVoco encode settings and specifying sampling duration in seconds, are still marked as "to-do." The project appears to be under active development with some features still pending.

voicebox-pytorch by lucidrains

Explore Similar Projects

VoiceFlow-TTS by X-LANCE

Meta-voicebox by SpeechifyInc

f5-tts-mlx by lucasnewman

vits2 by daniilrobnikov

FireRedTTS by FireRedTeam

parrots by shibing624

vits-simple-api by Artrajz

MARS5-TTS by Camb-ai

IMS-Toucan by DigitalPhonetics

RealtimeTTS by KoljaB

TTS by coqui-ai

GPT-SoVITS by RVC-Boss