audiocraft by facebookresearch

PyTorch library for audio processing and generation research

Created 2 years ago

22,873 stars

Top 1.8% on SourcePulse

20 Experts Love This Project

hiyouga

Author of LLaMA-Factory

Jiayi-Pan

Author of SWE-Gym; MTS at xAI

jn2clark

Cofounder of Marqo

calvinfo

Calvin French-Owen

Cofounder of Segment

and 16 more!

Project Summary

Audiocraft is a PyTorch library for deep learning research on audio generation, offering state-of-the-art models like MusicGen and AudioGen for high-quality audio synthesis. It targets researchers and developers in the audio AI space, providing tools for both inference and training of generative audio models.

How It Works

Audiocraft leverages a modular architecture built on PyTorch, incorporating several advanced AI models. Key components include EnCodec for efficient audio compression, MusicGen for controllable text-to-music generation, and AudioGen for text-to-sound synthesis. The library also supports diffusion models (Multi Band Diffusion) and non-autoregressive approaches (MAGNeT), enabling diverse audio generation capabilities.

Quick Start & Requirements

Install: python -m pip install -U audiocraft
Prerequisites: Python 3.9, PyTorch 2.1.0. ffmpeg is recommended.
Setup: Basic installation is quick; training requires cloning the repo and potentially installing additional dependencies (.[wm]).
Docs: https://github.com/facebookresearch/audiocraft

Highlighted Details

State-of-the-art models: MusicGen (text-to-music), AudioGen (text-to-sound), EnCodec (audio codec), MAGNeT, JASCO, AudioSeal (watermarking).
Controllable generation: MusicGen supports text and melody conditioning.
Training code: Provided for EnCodec, MusicGen, Multi Band Diffusion, and JASCO.
Model storage: Models are hosted on Hugging Face, with cache location configurable.

Maintenance & Community

Developed by Facebook Research.
Citation details provided for the general framework and specific models.

Licensing & Compatibility

Code License: MIT
Model Weights License: CC-BY-NC 4.0 (Non-commercial use).

Limitations & Caveats

The model weights are released under a non-commercial license, restricting their use in commercial products.

Health Check

Last Commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)

3

Issues (30d)

5

Star History

124 stars in the last 30 days

Explore Similar Projects

awesome-audio-plaza by metame-ai

Curated list of audio research papers, projects, and resources

Created 1 year ago

Updated 2 months ago

SongGen by LiuZH-19

Text-to-song generation with an auto-regressive transformer

Created 11 months ago

Updated 2 months ago

UniAudio by yangdongchao

Audio foundation model for universal audio generation

Created 2 years ago

Updated 1 year ago

awesome-large-audio-models by EmulationAI

Curated list of Large Language Models in Audio AI

Created 2 years ago

Updated 2 months ago

Starred by

Shawn Wang

Shawn Wang(Editor of Latent Space) and

Sam Partee

Sam Partee(Cofounder of Arcade).

soundstorm-pytorch by lucidrains

Pytorch implementation of SoundStorm for efficient parallel audio generation

Created 2 years ago

Updated 8 months ago

Starred by

Amin Ahmad

Amin Ahmad(Cofounder of Vectara) and

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

WavTokenizer by jishengpeng

Research paper for discrete acoustic codec models

Created 1 year ago

Updated 10 months ago

Starred by

Jesse Clark

Jesse Clark(Cofounder of Marqo).

tango by declare-lab

Diffusion model family for text-to-audio generation

Created 2 years ago

Updated 5 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

FunMusic by FunAudioLLM

Toolkit for music, song, and audio generation

Created 1 year ago

Updated 7 months ago

SongGeneration by tencent-ailab

AI framework for high-fidelity song generation

Created 7 months ago

Updated 4 weeks ago

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral) and

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

AudioLDM by haoheliu

Audio generation research paper using latent diffusion

Created 2 years ago

Updated 6 months ago

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs) and

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

audiolm-pytorch by lucidrains

PyTorch implementation of Google's AudioLM for audio generation

Created 3 years ago

Updated 1 year ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI) and

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs).

Kimi-Audio by MoonshotAI

Audio foundation model for understanding, generation, and conversation

Created 8 months ago

Updated 6 months ago

Feedback? Help us improve.