mustango  by AMAAI-Lab

Text-to-music generation research paper using multimodal LLMs

created 1 year ago
372 stars

Top 77.3% on sourcepulse

GitHubView on GitHub
Project Summary

Mustango is an open-source project enabling controllable text-to-music generation. It targets researchers and developers interested in AI-driven music creation, offering a Latent Diffusion Model (LDM) approach for generating music from detailed textual descriptions.

How It Works

Mustango combines a Latent Diffusion Model (LDM) with the Flan-T5 language model and specific musical features. This architecture allows for fine-grained control over the generated music based on textual prompts, aiming for higher fidelity and musical coherence compared to previous methods. The use of LDM is advantageous for its efficiency in generating high-resolution outputs.

Quick Start & Requirements

Highlighted Details

  • Achieves superior subjective evaluation scores on the MusicBench dataset compared to Tango and other models across multiple musical attributes.
  • Utilizes the MusicBench dataset, containing 52k music fragments with text captions, for training and evaluation.
  • Supports multi-GPU training via Hugging Face's accelerate package.
  • Offers pre-trained models for immediate use.

Maintenance & Community

The project is associated with AMAAI-Lab. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. However, the use of Hugging Face diffusers and models implies adherence to their respective licenses, which are generally permissive for research and commercial use, but users should verify.

Limitations & Caveats

Training from scratch on the MusicBench dataset is recommended to take at least 40 epochs, indicating a significant computational requirement for retraining. The project is presented as research, and stability for production use is not guaranteed.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
13 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

AudioGPT by AIGC-Audio

0.1%
10k
Audio processing and generation research project
created 2 years ago
updated 1 year ago
Feedback? Help us improve.