open-musiclm  by zhvng

PyTorch implementation of Google's MusicLM text-to-music model

created 2 years ago
549 stars

Top 59.0% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a PyTorch implementation of Google's MusicLM text-to-music model, targeting researchers and developers interested in generative audio. It offers a functional alternative to the original by substituting key components with publicly available, pre-trained models like CLAP, Encodec, and MERT, enabling faster experimentation and broader accessibility.

How It Works

The implementation models audio generation as a sequence-to-sequence task. It leverages CLAP for joint audio-text representation, Encodec for neural audio compression into discrete tokens, and MERT for acoustic understanding. Conditioning signals are autoregressively modeled and passed into transformers, differing from the original MusicLM's cross-attention approach. This modular design allows for easier experimentation with different conditioning signals and stereo generation.

Quick Start & Requirements

  • Install via conda env create -f environment.yaml and conda activate open-musiclm.
  • Requires Python, PyTorch, and specific dependencies listed in environment.yaml.
  • Training involves multiple stages: CLAP RVQ, Hubert K-means, and then the semantic, coarse, and fine audio generation stages.
  • Inference scripts are provided for generating audio from text prompts.
  • Official checkpoints for musiclm_large_small_context are available.

Highlighted Details

  • Replaces MuLan with CLAP, SoundStream with Encodec, and w2v-BERT with MERT.
  • Autoregressively models conditioning signals, differing from MusicLM's cross-attention.
  • Supports variable token sequences for easier experimentation.
  • Provides scripts for training and inference, including a top-match inference option.

Maintenance & Community

  • Developed by zhvng, with contributions acknowledged from Okio (Nendo) and @lucidrains.
  • A Discord server is available for community involvement.

Licensing & Compatibility

  • The repository itself is not explicitly licensed in the README.
  • Dependencies (CLAP, Encodec, MERT) have their own licenses, which may impact commercial use or closed-source linking.

Limitations & Caveats

The project aims for rapid replication rather than a strict adherence to the original MusicLM architecture. The effectiveness of CLAP's latent space for music generation is still under evaluation, with potential for retraining or substitution if needed. Pre-trained checkpoints are experimental.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.