audiocraft  by facebookresearch

PyTorch library for audio processing and generation research

Created 2 years ago
22,486 stars

Top 1.8% on SourcePulse

GitHubView on GitHub
Project Summary

Audiocraft is a PyTorch library for deep learning research on audio generation, offering state-of-the-art models like MusicGen and AudioGen for high-quality audio synthesis. It targets researchers and developers in the audio AI space, providing tools for both inference and training of generative audio models.

How It Works

Audiocraft leverages a modular architecture built on PyTorch, incorporating several advanced AI models. Key components include EnCodec for efficient audio compression, MusicGen for controllable text-to-music generation, and AudioGen for text-to-sound synthesis. The library also supports diffusion models (Multi Band Diffusion) and non-autoregressive approaches (MAGNeT), enabling diverse audio generation capabilities.

Quick Start & Requirements

  • Install: python -m pip install -U audiocraft
  • Prerequisites: Python 3.9, PyTorch 2.1.0. ffmpeg is recommended.
  • Setup: Basic installation is quick; training requires cloning the repo and potentially installing additional dependencies (.[wm]).
  • Docs: https://github.com/facebookresearch/audiocraft

Highlighted Details

  • State-of-the-art models: MusicGen (text-to-music), AudioGen (text-to-sound), EnCodec (audio codec), MAGNeT, JASCO, AudioSeal (watermarking).
  • Controllable generation: MusicGen supports text and melody conditioning.
  • Training code: Provided for EnCodec, MusicGen, Multi Band Diffusion, and JASCO.
  • Model storage: Models are hosted on Hugging Face, with cache location configurable.

Maintenance & Community

  • Developed by Facebook Research.
  • Citation details provided for the general framework and specific models.

Licensing & Compatibility

  • Code License: MIT
  • Model Weights License: CC-BY-NC 4.0 (Non-commercial use).

Limitations & Caveats

The model weights are released under a non-commercial license, restricting their use in commercial products.

Health Check
Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
5
Star History
109 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral) and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

AudioLDM by haoheliu

0.1%
3k
Audio generation research paper using latent diffusion
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.