X-Codec-2.0 by zhenye234

Codec for speech synthesis research (LLaSA paper)

Created 1 year ago

349 stars

Top 80.0% on SourcePulse

Project Summary

This repository provides X-Codec-2.0, a speech codec designed for LLaMA-based speech synthesis, as detailed in the LLaSA paper. It targets researchers and developers in speech synthesis and processing, offering efficient and high-quality speech representation.

How It Works

X-Codec-2.0 utilizes a Transformer architecture combined with a Vocos decoder for high-quality speech reconstruction. It employs Single Vector Quantization with a 65536 codebook size, achieving high codebook usage comparable to text tokenizers. The semantic encoder is based on Wav2Vec2-BERT, pre-trained on extensive multilingual audio data, enabling broad language support.

Quick Start & Requirements

Install: Clone the repo, create a conda environment (conda create --name xcodec2 python=3.9), activate it (conda activate xcodec2), and install dependencies (pip install -r requirements.txt).
Prerequisites: Python 3.9. Pretrained checkpoints must be downloaded separately.
Setup Time: Estimated to be under 15 minutes for environment setup and checkpoint download.

Highlighted Details

Achieves 99% codebook usage with Finite Scalar Quantization.
Supports multilingual speech semantics across over 143 languages.
Trained on 150k hours of multilingual speech data.
Reports strong reconstruction metrics: UTMOS 4.13, WER 2.47, STOI 0.92.

Maintenance & Community

The project is primarily maintained by zhenye234. The codebase is largely borrowed from BigCodec. No specific community channels or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given the reliance on BigCodec, users should verify the licensing terms of that project for compatibility, especially for commercial use.

Limitations & Caveats

The project is presented as a research artifact. While version 0.1.5 is recommended for inference and fine-tuning, version 0.1.3 is suggested for stability during codec training. Potential issues may arise with dependencies beyond the tested environment.

X-Codec-2.0 by zhenye234

Explore Similar Projects

xcodec by zhenye234

SpeechGPT-2.0-preview by OpenMOSS

assem-vc by maum-ai

FastDiff by Rongjiehuang

FireRedTTS by FireRedTeam

HierSpeechpp by sh-lee-prml

MARS5-TTS by Camb-ai

WhisperSpeech by WhisperSpeech

MegaTTS3 by bytedance

seamless_communication by facebookresearch

Spark-TTS by SparkAudio

espnet by espnet