mustango by AMAAI-Lab

Text-to-music generation research paper using multimodal LLMs

Created 2 years ago

386 stars

Top 74.4% on SourcePulse

Project Summary

Mustango is an open-source project enabling controllable text-to-music generation. It targets researchers and developers interested in AI-driven music creation, offering a Latent Diffusion Model (LDM) approach for generating music from detailed textual descriptions.

How It Works

Mustango combines a Latent Diffusion Model (LDM) with the Flan-T5 language model and specific musical features. This architecture allows for fine-grained control over the generated music based on textual prompts, aiming for higher fidelity and musical coherence compared to previous methods. The use of LDM is advantageous for its efficiency in generating high-resolution outputs.

Quick Start & Requirements

Install: Clone the repository, cd into it, and run pip install -r requirements.txt. Then navigate to the diffusers directory and run pip install -e ..
Prerequisites: Python, soundfile, IPython, Hugging Face diffusers, and accelerate for training.
Demo: Available on Hugging Face Spaces: https://huggingface.co/spaces/declare-lab/mustango
Model: Available on Hugging Face: https://huggingface.co/declare-lab/mustango

Highlighted Details

Achieves superior subjective evaluation scores on the MusicBench dataset compared to Tango and other models across multiple musical attributes.
Utilizes the MusicBench dataset, containing 52k music fragments with text captions, for training and evaluation.
Supports multi-GPU training via Hugging Face's accelerate package.
Offers pre-trained models for immediate use.

Maintenance & Community

The project is associated with AMAAI-Lab. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. However, the use of Hugging Face diffusers and models implies adherence to their respective licenses, which are generally permissive for research and commercial use, but users should verify.

Limitations & Caveats

Training from scratch on the MusicBench dataset is recommended to take at least 40 epochs, indicating a significant computational requirement for retraining. The project is presented as research, and stability for production use is not guaranteed.

mustango by AMAAI-Lab

Explore Similar Projects

SongGen by LiuZH-19

ChatMusician by hf-lin

music-generation-research by AI-Guru

lp-music-caps by seungheondoh

MuMu-LLaMA by shansongliu

genmusic_demo_list by affige

tango by declare-lab

FunMusic by FunAudioLLM

MusicGPT by gabotechs

SongGeneration by tencent-ailab

jukebox by openai

audiocraft by facebookresearch