FunCodec  by modelscope

Speech codec toolkit for audio quantization and downstream tasks

Created 1 year ago
425 stars

Top 69.4% on SourcePulse

GitHubView on GitHub
Project Summary

FunCodec is a research toolkit for neural speech codecs, focusing on audio quantization and enabling downstream applications like text-to-speech and music generation. It provides a fundamental, reproducible, and integrable framework for researchers and developers in speech and audio processing.

How It Works

FunCodec leverages a codec-based approach, quantizing audio into discrete tokens for efficient representation and manipulation. It supports various quantization strategies and model architectures, including those inspired by Encodec, allowing for flexible experimentation with different bitrates and performance trade-offs. The toolkit emphasizes reproducibility through detailed recipes for training and inference.

Quick Start & Requirements

  • Install: git clone https://github.com/alibaba/FunCodec.git && cd FunCodec && pip install --editable ./
  • Prerequisites: Python, PyTorch. GPU with CUDA is recommended for training and inference.
  • Resources: Pre-trained models are available on Hugging Face and ModelScope. Training recipes are provided for LibriTTS and custom datasets.
  • Links: Huggingface Models, ModelScope

Highlighted Details

  • Supports multiple audio codec models with varying bitrates and parameter counts.
  • Includes recipes for training and inference, with examples for LibriTTS.
  • Offers integration with LauraTTS for zero-shot text-to-speech, outperforming VALL-E.
  • Allows training on custom datasets using Kaldi-like wav.scp format.

Maintenance & Community

The project is actively under development, with recent updates including LauraTTS recipes. It acknowledges borrowing code from Kaldi and ESPnet, indicating potential community influence.

Licensing & Compatibility

Licensed under The MIT License. The project notes that it contains third-party components under other open-source licenses. This license is generally permissive for commercial use and closed-source linking.

Limitations & Caveats

The project is explicitly stated as "still working on progress," suggesting potential for ongoing changes and incomplete features. Specific performance claims or detailed benchmarks beyond the LauraTTS comparison are not extensively detailed in the README.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

Amphion by open-mmlab

0.2%
9k
Toolkit for audio, music, and speech generation research
Created 1 year ago
Updated 3 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
2 more.

AudioGPT by AIGC-Audio

0.0%
10k
Audio processing and generation research project
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.