encodec by facebookresearch

Neural audio codec for high-fidelity compression research

Created 3 years ago

3,895 stars

Top 12.3% on SourcePulse

View on GitHub

10 Experts Love This Project

Théophile Gervet

Cofounder of Genesis AI

Jiayi Pan

Author of SWE-Gym; MTS at xAI

Omar Sanseviero

DevRel at Google DeepMind

Anastasios Angelopoulos

Cofounder of LMArena

and 6 more!

Project Summary

EnCodec is a state-of-the-art neural audio codec for high-fidelity audio compression, targeting researchers and developers in audio processing and machine learning. It offers significant compression ratios with minimal perceptual quality loss, enabling efficient storage and transmission of audio data.

How It Works

EnCodec employs a neural network architecture for audio compression, utilizing residual vector quantization (RVQ) to represent audio signals efficiently. It supports both causal (24 kHz mono) and non-causal (48 kHz stereo) models, with configurable bandwidths from 1.5 kbps to 24 kbps. Pre-trained language models can further compress the representations by up to 40% via entropy coding.

Quick Start & Requirements

Install via pip: pip install -U encodec or pip install -U git+https://github.com/huggingface/transformers.git@main for Transformers integration.
Requirements: Python 3.8+, PyTorch 1.11.0+.
Supported Platforms: macOS, recent Linux distributions. Windows support is experimental.
Official Docs: Transformers Encodec Docs

Highlighted Details

Offers two models: 24 kHz mono (causal) and 48 kHz stereo (non-causal).
Supports bandwidths: 1.5, 3, 6, 12, 24 kbps (24 kHz model) and 3, 6, 12, 24 kbps (48 kHz model).
Integrates with Hugging Face Transformers for scalable use.
Provides command-line tools for compression and decompression.

Maintenance & Community

Developed by Facebook Research.
Changelog available for release details.
Citation details provided for academic use.

Licensing & Compatibility

Released under the MIT license.
Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The project explicitly states it does not optimize for long audio files, potentially leading to out-of-memory errors due to processing the entire file at once. Windows support is experimental.

Health Check

Last Commit

2 years ago

Responsiveness

1+ week

Pull Requests (30d)

Issues (30d)

Star History

22 stars in the last 30 days