Codec for speech synthesis research (LLaSA paper)
Top 92.3% on sourcepulse
This repository provides X-Codec-2.0, a speech codec designed for LLaMA-based speech synthesis, as detailed in the LLaSA paper. It targets researchers and developers in speech synthesis and processing, offering efficient and high-quality speech representation.
How It Works
X-Codec-2.0 utilizes a Transformer architecture combined with a Vocos decoder for high-quality speech reconstruction. It employs Single Vector Quantization with a 65536 codebook size, achieving high codebook usage comparable to text tokenizers. The semantic encoder is based on Wav2Vec2-BERT, pre-trained on extensive multilingual audio data, enabling broad language support.
Quick Start & Requirements
conda create --name xcodec2 python=3.9
), activate it (conda activate xcodec2
), and install dependencies (pip install -r requirements.txt
).Highlighted Details
Maintenance & Community
The project is primarily maintained by zhenye234. The codebase is largely borrowed from BigCodec. No specific community channels or roadmap are mentioned in the README.
Licensing & Compatibility
The README does not explicitly state a license. Given the reliance on BigCodec, users should verify the licensing terms of that project for compatibility, especially for commercial use.
Limitations & Caveats
The project is presented as a research artifact. While version 0.1.5
is recommended for inference and fine-tuning, version 0.1.3
is suggested for stability during codec training. Potential issues may arise with dependencies beyond the tested environment.
1 week ago
1 day