stable-codec  by Stability-AI

Transformer-based audio codec for low-bitrate, high-quality audio coding

Created 9 months ago
392 stars

Top 73.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides Transformer-based audio codecs for high-quality audio at low bitrates, targeting researchers and developers in audio processing and speech coding. It offers state-of-the-art performance for applications like speech synthesis and efficient audio transmission.

How It Works

The Stable Codec family utilizes Transformer architectures with sliding window attention for efficient audio encoding and decoding. It employs a Frequency-Separated Quantization (FSQ) bottleneck, which can be configured post-hoc to reduce token dictionary size, making it compatible with large language models. This approach balances reconstruction quality with compression efficiency.

Quick Start & Requirements

Highlighted Details

  • Offers two variants: stable-codec-speech-16k (fine-tuned for downstream tasks) and stable-codec-speech-16k-base (for reproducibility).
  • Fine-tuned variant includes 500k steps with force-aligned data and CTC loss for improved applicability in TTS.
  • Supports post-hoc bottleneck configuration for flexible token dictionary sizes and bitrates (e.g., 400bps, 700bps, 1000bps).
  • Achieves SI-SDR of 3.58 and PESQ of 3.01 for the fine-tuned model.

Maintenance & Community

  • Developed by Stability AI.
  • Changelog indicates ongoing development and bug fixes.
  • Further training details and dataset configuration are available in stable-audio-tools documentation.

Licensing & Compatibility

  • Code is MIT licensed.
  • Model weights are covered by the Stability AI Community License.
  • Compatibility for commercial use or closed-source linking depends on the terms of the Stability AI Community License.

Limitations & Caveats

The model has a hard requirement for FlashAttention, preventing CPU inference and requiring specific GPU hardware. The "stable-codec-speech-16k" variant shows slightly lower objective reconstruction metrics compared to the base model.

Health Check
Last Commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.