encodec  by facebookresearch

Neural audio codec for high-fidelity compression research

created 2 years ago
3,750 stars

Top 13.2% on sourcepulse

GitHubView on GitHub
Project Summary

EnCodec is a state-of-the-art neural audio codec for high-fidelity audio compression, targeting researchers and developers in audio processing and machine learning. It offers significant compression ratios with minimal perceptual quality loss, enabling efficient storage and transmission of audio data.

How It Works

EnCodec employs a neural network architecture for audio compression, utilizing residual vector quantization (RVQ) to represent audio signals efficiently. It supports both causal (24 kHz mono) and non-causal (48 kHz stereo) models, with configurable bandwidths from 1.5 kbps to 24 kbps. Pre-trained language models can further compress the representations by up to 40% via entropy coding.

Quick Start & Requirements

  • Install via pip: pip install -U encodec or pip install -U git+https://github.com/huggingface/transformers.git@main for Transformers integration.
  • Requirements: Python 3.8+, PyTorch 1.11.0+.
  • Supported Platforms: macOS, recent Linux distributions. Windows support is experimental.
  • Official Docs: Transformers Encodec Docs

Highlighted Details

  • Offers two models: 24 kHz mono (causal) and 48 kHz stereo (non-causal).
  • Supports bandwidths: 1.5, 3, 6, 12, 24 kbps (24 kHz model) and 3, 6, 12, 24 kbps (48 kHz model).
  • Integrates with Hugging Face Transformers for scalable use.
  • Provides command-line tools for compression and decompression.

Maintenance & Community

  • Developed by Facebook Research.
  • Changelog available for release details.
  • Citation details provided for academic use.

Licensing & Compatibility

  • Released under the MIT license.
  • Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The project explicitly states it does not optimize for long audio files, potentially leading to out-of-memory errors due to processing the entire file at once. Windows support is experimental.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
85 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers) and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

AudioLDM by haoheliu

0.3%
3k
Audio generation research paper using latent diffusion
created 2 years ago
updated 1 month ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
12 more.

DeepSpeed by deepspeedai

0.2%
40k
Deep learning optimization library for distributed training and inference
created 5 years ago
updated 1 day ago
Feedback? Help us improve.