MU-LLaMA  by shansongliu

Music understanding model for question answering and captioning

Created 2 years ago
291 stars

Top 90.6% on SourcePulse

GitHubView on GitHub
Project Summary

MU-LLaMA is a language model designed for music understanding tasks, specifically answering questions about music and generating captions for music files to create Text-to-Music datasets. It targets researchers and developers working on multimodal AI for music, offering a novel approach to integrate audio context into large language models.

How It Works

MU-LLaMA builds upon the LLaMA-2 backbone, incorporating the MERT music encoder. An adapter layer is used to fuse music context information, guiding LLaMA's output for music-related question answering and captioning. MERT was selected after comparative analysis of various music representation models.

Quick Start & Requirements

  • Install: python gradio_app.py --model ./ckpts/checkpoint.pth --llama_dir ./ckpts/LLaMA
  • Prerequisites: Python 3.9.17, LLaMA-2 model weights (obtainable via HuggingFace), MU-LLaMA pretrained weights (downloadable).
  • Setup: Requires downloading LLaMA-2 and MU-LLaMA weights.
  • Demo: https://huggingface.co/spaces/shansongliu/MU-LLaMA

Highlighted Details

  • Achieves state-of-the-art results on music understanding benchmarks, outperforming models like LTU and LLaMA Adapter.
  • Provides code for generating a MusicQA dataset from MusicCaps and MagnaTagATune.
  • Supports both question answering and music captioning.
  • Leverages the MERT model for music representation.

Maintenance & Community

  • The project is associated with authors from academic institutions.
  • Citation details are provided in BibTeX format.

Licensing & Compatibility

  • The project's code is likely subject to the LLaMA-2 license. The README does not explicitly state a license for the MU-LLaMA code itself.

Limitations & Caveats

  • Generating the MusicQA dataset is computationally intensive, requiring approximately 8 days on a Tesla V100 GPU.
  • Requires obtaining LLaMA-2 model weights separately.
Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

Amphion by open-mmlab

0.1%
9k
Toolkit for audio, music, and speech generation research
Created 2 years ago
Updated 5 months ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Dan Abramov Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App), and
11 more.

jukebox by openai

0.0%
8k
Generative model for music research paper
Created 5 years ago
Updated 1 year ago
Feedback? Help us improve.