WavTokenizer by jishengpeng

Research paper for discrete acoustic codec models

Created 1 year ago

1,253 stars

Top 31.5% on SourcePulse

View on GitHub

2 Experts Love This Project

Amin Ahmad

Cofounder of Vectara

Georgi Gerganov

Author of llama.cpp, whisper.cpp

Project Summary

WavTokenizer provides state-of-the-art discrete acoustic codec models for audio language modeling, enabling efficient representation of speech and music with low token rates. It is designed for researchers and developers working on advanced audio processing and generative AI models, offering strong reconstruction capabilities and rich semantic information.

How It Works

WavTokenizer employs a neural network architecture to quantize audio waveforms into discrete tokens. It achieves high compression rates by learning a compact representation of audio signals, allowing for efficient processing and generation. The models are trained on diverse datasets, enabling them to capture nuances across speech, audio, and music domains.

Quick Start & Requirements

Install via conda create -n wavtokenizer python=3.9 and conda activate wavtokenizer, then pip install -r requirements.txt.
Requires PyTorch and torchaudio.
Official HuggingFace model hub links are provided for various checkpoints.

Highlighted Details

Achieves 40 or 75 tokens per second for audio representation.
Offers strong audio reconstruction results.
Models are available for speech, audio, and music domains.
Supports training from scratch with provided configuration and training scripts.

Maintenance & Community

The project has released multiple checkpoints and updates, including camera-ready versions for ICLR 2025 and arXiv preprints. Links to HuggingFace for models are available.

Licensing & Compatibility

The project is open-source, indicated by a checkmark in the "Open-Source" column for all listed models. Specific license details are not explicitly stated in the README, but open-source availability suggests compatibility with many research and commercial applications.

Limitations & Caveats

The README mentions using configs/xxx.yaml and xxx.ckpt, implying that users need to obtain or configure these specific files, which are not directly provided in the main repository structure. The training process refers to PyTorch Lightning documentation for customization, suggesting a learning curve for custom training setups.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days