LLaSA_training  by zhenye234

Speech synthesis research paper using LLaMA

Created 8 months ago
615 stars

Top 53.5% on SourcePulse

GitHubView on GitHub
Project Summary

LLaSA is a framework for speech synthesis that scales both training and inference compute for LLaMA-based models. It targets researchers and developers working on large-scale, multilingual text-to-speech systems, offering a unified approach to handle both text and speech tokens.

How It Works

LLaSA employs a unified tokenizer that combines text tokens from Llama models with specialized speech tokens derived from X-codec2. This approach allows for end-to-end training of speech synthesis models, enabling efficient scaling of compute resources for both training and inference.

Quick Start & Requirements

  • Install/Run: torchrun --nproc_per_node=8 train_tts.py config.json or sbatch run_slurm.sh
  • Prerequisites: Python, PyTorch, Hugging Face Codec (xcodec2), Llama tokenizer. Requires significant computational resources for training.
  • Data: Open-source datasets (LibriHeavy, Emilia, WenetSpeech4TTS) totaling 160,000 hours are available. Models are trained on 250,000 hours, including 90,000 hours of internal data.

Highlighted Details

  • Supports multilingual speech synthesis (Chinese, English, Japanese, Korean).
  • Offers LLaSA 1B models, including multilingual and finetuned versions.
  • Utilizes X-codec2 for speech tokenization.
  • Paper released (2025-02-07).

Maintenance & Community

  • Recent updates include finetune instructions and multilingual model releases.
  • No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

  • The README does not specify a license. Compatibility for commercial or closed-source use is undetermined.

Limitations & Caveats

The project relies on internal datasets not available for public release, which may limit reproducibility for users without access to similar proprietary data. The absence of a specified license raises concerns about commercial use.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.