open-musiclm by zhvng

PyTorch implementation of Google's MusicLM text-to-music model

Created 2 years ago

555 stars

Top 57.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

This repository provides a PyTorch implementation of Google's MusicLM text-to-music model, targeting researchers and developers interested in generative audio. It offers a functional alternative to the original by substituting key components with publicly available, pre-trained models like CLAP, Encodec, and MERT, enabling faster experimentation and broader accessibility.

How It Works

The implementation models audio generation as a sequence-to-sequence task. It leverages CLAP for joint audio-text representation, Encodec for neural audio compression into discrete tokens, and MERT for acoustic understanding. Conditioning signals are autoregressively modeled and passed into transformers, differing from the original MusicLM's cross-attention approach. This modular design allows for easier experimentation with different conditioning signals and stereo generation.

Quick Start & Requirements

Install via conda env create -f environment.yaml and conda activate open-musiclm.
Requires Python, PyTorch, and specific dependencies listed in environment.yaml.
Training involves multiple stages: CLAP RVQ, Hubert K-means, and then the semantic, coarse, and fine audio generation stages.
Inference scripts are provided for generating audio from text prompts.
Official checkpoints for musiclm_large_small_context are available.

Highlighted Details

Replaces MuLan with CLAP, SoundStream with Encodec, and w2v-BERT with MERT.
Autoregressively models conditioning signals, differing from MusicLM's cross-attention.
Supports variable token sequences for easier experimentation.
Provides scripts for training and inference, including a top-match inference option.

Maintenance & Community

Developed by zhvng, with contributions acknowledged from Okio (Nendo) and @lucidrains.
A Discord server is available for community involvement.

Licensing & Compatibility

The repository itself is not explicitly licensed in the README.
Dependencies (CLAP, Encodec, MERT) have their own licenses, which may impact commercial use or closed-source linking.

Limitations & Caveats

The project aims for rapid replication rather than a strict adherence to the original MusicLM architecture. The effectiveness of CLAP's latent space for music generation is still under evaluation, with potential for retraining or substitution if needed. Pre-trained checkpoints are experimental.

open-musiclm by zhvng

Explore Similar Projects

Step-Audio-R1 by stepfun-ai

WavJourney by Audio-AGI

awesome-large-audio-models by EmulationAI

PlayDiffusion by playht

soundstorm-pytorch by lucidrains

WavTokenizer by jishengpeng

tango by declare-lab

FunMusic by FunAudioLLM

audiolm-pytorch by lucidrains

Kimi-Audio by MoonshotAI

audiocraft by facebookresearch

so-vits-svc by svc-develop-team