X-Codec-2.0  by zhenye234

Codec for speech synthesis research (LLaSA paper)

created 7 months ago
287 stars

Top 92.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides X-Codec-2.0, a speech codec designed for LLaMA-based speech synthesis, as detailed in the LLaSA paper. It targets researchers and developers in speech synthesis and processing, offering efficient and high-quality speech representation.

How It Works

X-Codec-2.0 utilizes a Transformer architecture combined with a Vocos decoder for high-quality speech reconstruction. It employs Single Vector Quantization with a 65536 codebook size, achieving high codebook usage comparable to text tokenizers. The semantic encoder is based on Wav2Vec2-BERT, pre-trained on extensive multilingual audio data, enabling broad language support.

Quick Start & Requirements

  • Install: Clone the repo, create a conda environment (conda create --name xcodec2 python=3.9), activate it (conda activate xcodec2), and install dependencies (pip install -r requirements.txt).
  • Prerequisites: Python 3.9. Pretrained checkpoints must be downloaded separately.
  • Setup Time: Estimated to be under 15 minutes for environment setup and checkpoint download.

Highlighted Details

  • Achieves 99% codebook usage with Finite Scalar Quantization.
  • Supports multilingual speech semantics across over 143 languages.
  • Trained on 150k hours of multilingual speech data.
  • Reports strong reconstruction metrics: UTMOS 4.13, WER 2.47, STOI 0.92.

Maintenance & Community

The project is primarily maintained by zhenye234. The codebase is largely borrowed from BigCodec. No specific community channels or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given the reliance on BigCodec, users should verify the licensing terms of that project for compatibility, especially for commercial use.

Limitations & Caveats

The project is presented as a research artifact. While version 0.1.5 is recommended for inference and fine-tuning, version 0.1.3 is suggested for stability during codec training. Potential issues may arise with dependencies beyond the tested environment.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
4
Star History
25 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.