FunCodec  by modelscope

Speech codec toolkit for audio quantization and downstream tasks

created 1 year ago
414 stars

Top 71.8% on sourcepulse

GitHubView on GitHub
Project Summary

FunCodec is a research toolkit for neural speech codecs, focusing on audio quantization and enabling downstream applications like text-to-speech and music generation. It provides a fundamental, reproducible, and integrable framework for researchers and developers in speech and audio processing.

How It Works

FunCodec leverages a codec-based approach, quantizing audio into discrete tokens for efficient representation and manipulation. It supports various quantization strategies and model architectures, including those inspired by Encodec, allowing for flexible experimentation with different bitrates and performance trade-offs. The toolkit emphasizes reproducibility through detailed recipes for training and inference.

Quick Start & Requirements

  • Install: git clone https://github.com/alibaba/FunCodec.git && cd FunCodec && pip install --editable ./
  • Prerequisites: Python, PyTorch. GPU with CUDA is recommended for training and inference.
  • Resources: Pre-trained models are available on Hugging Face and ModelScope. Training recipes are provided for LibriTTS and custom datasets.
  • Links: Huggingface Models, ModelScope

Highlighted Details

  • Supports multiple audio codec models with varying bitrates and parameter counts.
  • Includes recipes for training and inference, with examples for LibriTTS.
  • Offers integration with LauraTTS for zero-shot text-to-speech, outperforming VALL-E.
  • Allows training on custom datasets using Kaldi-like wav.scp format.

Maintenance & Community

The project is actively under development, with recent updates including LauraTTS recipes. It acknowledges borrowing code from Kaldi and ESPnet, indicating potential community influence.

Licensing & Compatibility

Licensed under The MIT License. The project notes that it contains third-party components under other open-source licenses. This license is generally permissive for commercial use and closed-source linking.

Limitations & Caveats

The project is explicitly stated as "still working on progress," suggesting potential for ongoing changes and incomplete features. Specific performance claims or detailed benchmarks beyond the LauraTTS comparison are not extensively detailed in the README.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
19 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.