stable-codec  by Stability-AI

Transformer-based audio codec for low-bitrate, high-quality audio coding

created 8 months ago
387 stars

Top 75.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides Transformer-based audio codecs for high-quality audio at low bitrates, targeting researchers and developers in audio processing and speech coding. It offers state-of-the-art performance for applications like speech synthesis and efficient audio transmission.

How It Works

The Stable Codec family utilizes Transformer architectures with sliding window attention for efficient audio encoding and decoding. It employs a Frequency-Separated Quantization (FSQ) bottleneck, which can be configured post-hoc to reduce token dictionary size, making it compatible with large language models. This approach balances reconstruction quality with compression efficiency.

Quick Start & Requirements

Highlighted Details

  • Offers two variants: stable-codec-speech-16k (fine-tuned for downstream tasks) and stable-codec-speech-16k-base (for reproducibility).
  • Fine-tuned variant includes 500k steps with force-aligned data and CTC loss for improved applicability in TTS.
  • Supports post-hoc bottleneck configuration for flexible token dictionary sizes and bitrates (e.g., 400bps, 700bps, 1000bps).
  • Achieves SI-SDR of 3.58 and PESQ of 3.01 for the fine-tuned model.

Maintenance & Community

  • Developed by Stability AI.
  • Changelog indicates ongoing development and bug fixes.
  • Further training details and dataset configuration are available in stable-audio-tools documentation.

Licensing & Compatibility

  • Code is MIT licensed.
  • Model weights are covered by the Stability AI Community License.
  • Compatibility for commercial use or closed-source linking depends on the terms of the Stability AI Community License.

Limitations & Caveats

The model has a hard requirement for FlashAttention, preventing CPU inference and requiring specific GPU hardware. The "stable-codec-speech-16k" variant shows slightly lower objective reconstruction metrics compared to the base model.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
31 stars in the last 90 days

Explore Similar Projects

Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Calvin French-Owen Calvin French-Owen(Coounder of Segment), and
12 more.

StableLM by Stability-AI

0.0%
16k
Language models by Stability AI
created 2 years ago
updated 1 year ago
Feedback? Help us improve.