FunCodec by modelscope

Speech codec toolkit for audio quantization and downstream tasks

Created 2 years ago

436 stars

Top 68.4% on SourcePulse

Project Summary

FunCodec is a research toolkit for neural speech codecs, focusing on audio quantization and enabling downstream applications like text-to-speech and music generation. It provides a fundamental, reproducible, and integrable framework for researchers and developers in speech and audio processing.

How It Works

FunCodec leverages a codec-based approach, quantizing audio into discrete tokens for efficient representation and manipulation. It supports various quantization strategies and model architectures, including those inspired by Encodec, allowing for flexible experimentation with different bitrates and performance trade-offs. The toolkit emphasizes reproducibility through detailed recipes for training and inference.

Quick Start & Requirements

Install: git clone https://github.com/alibaba/FunCodec.git && cd FunCodec && pip install --editable ./
Prerequisites: Python, PyTorch. GPU with CUDA is recommended for training and inference.
Resources: Pre-trained models are available on Hugging Face and ModelScope. Training recipes are provided for LibriTTS and custom datasets.
Links: Huggingface Models, ModelScope

Highlighted Details

Supports multiple audio codec models with varying bitrates and parameter counts.
Includes recipes for training and inference, with examples for LibriTTS.
Offers integration with LauraTTS for zero-shot text-to-speech, outperforming VALL-E.
Allows training on custom datasets using Kaldi-like wav.scp format.

Maintenance & Community

The project is actively under development, with recent updates including LauraTTS recipes. It acknowledges borrowing code from Kaldi and ESPnet, indicating potential community influence.

Licensing & Compatibility

Licensed under The MIT License. The project notes that it contains third-party components under other open-source licenses. This license is generally permissive for commercial use and closed-source linking.

Limitations & Caveats

The project is explicitly stated as "still working on progress," suggesting potential for ongoing changes and incomplete features. Specific performance claims or detailed benchmarks beyond the LauraTTS comparison are not extensively detailed in the README.

FunCodec by modelscope

Explore Similar Projects

awesome-audio-plaza by metame-ai

UniAudio by yangdongchao

GenerSpeech by Rongjiehuang

awesome-large-audio-models by EmulationAI

ultimate-rvc by JackismyShephard

audio-ai-timeline by archinetai

zamia-speech by gooofy

dia2 by nari-labs

audiolm-pytorch by lucidrains

Amphion by open-mmlab

AudioGPT by AIGC-Audio

speechbrain by speechbrain