audiocraft  by facebookresearch

PyTorch library for audio processing and generation research

created 2 years ago
22,348 stars

Top 1.9% on sourcepulse

GitHubView on GitHub
Project Summary

Audiocraft is a PyTorch library for deep learning research on audio generation, offering state-of-the-art models like MusicGen and AudioGen for high-quality audio synthesis. It targets researchers and developers in the audio AI space, providing tools for both inference and training of generative audio models.

How It Works

Audiocraft leverages a modular architecture built on PyTorch, incorporating several advanced AI models. Key components include EnCodec for efficient audio compression, MusicGen for controllable text-to-music generation, and AudioGen for text-to-sound synthesis. The library also supports diffusion models (Multi Band Diffusion) and non-autoregressive approaches (MAGNeT), enabling diverse audio generation capabilities.

Quick Start & Requirements

  • Install: python -m pip install -U audiocraft
  • Prerequisites: Python 3.9, PyTorch 2.1.0. ffmpeg is recommended.
  • Setup: Basic installation is quick; training requires cloning the repo and potentially installing additional dependencies (.[wm]).
  • Docs: https://github.com/facebookresearch/audiocraft

Highlighted Details

  • State-of-the-art models: MusicGen (text-to-music), AudioGen (text-to-sound), EnCodec (audio codec), MAGNeT, JASCO, AudioSeal (watermarking).
  • Controllable generation: MusicGen supports text and melody conditioning.
  • Training code: Provided for EnCodec, MusicGen, Multi Band Diffusion, and JASCO.
  • Model storage: Models are hosted on Hugging Face, with cache location configurable.

Maintenance & Community

  • Developed by Facebook Research.
  • Citation details provided for the general framework and specific models.

Licensing & Compatibility

  • Code License: MIT
  • Model Weights License: CC-BY-NC 4.0 (Non-commercial use).

Limitations & Caveats

The model weights are released under a non-commercial license, restricting their use in commercial products.

Health Check
Last commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
6
Star History
492 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers) and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

AudioLDM by haoheliu

0.3%
3k
Audio generation research paper using latent diffusion
created 2 years ago
updated 1 month ago
Feedback? Help us improve.