PyTorch implementation of Google's MusicLM text-to-music model
Top 59.0% on sourcepulse
This repository provides a PyTorch implementation of Google's MusicLM text-to-music model, targeting researchers and developers interested in generative audio. It offers a functional alternative to the original by substituting key components with publicly available, pre-trained models like CLAP, Encodec, and MERT, enabling faster experimentation and broader accessibility.
How It Works
The implementation models audio generation as a sequence-to-sequence task. It leverages CLAP for joint audio-text representation, Encodec for neural audio compression into discrete tokens, and MERT for acoustic understanding. Conditioning signals are autoregressively modeled and passed into transformers, differing from the original MusicLM's cross-attention approach. This modular design allows for easier experimentation with different conditioning signals and stereo generation.
Quick Start & Requirements
conda env create -f environment.yaml
and conda activate open-musiclm
.environment.yaml
.musiclm_large_small_context
are available.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project aims for rapid replication rather than a strict adherence to the original MusicLM architecture. The effectiveness of CLAP's latent space for music generation is still under evaluation, with potential for retraining or substitution if needed. Pre-trained checkpoints are experimental.
2 years ago
Inactive