MU-LLaMA  by shansongliu

Music understanding model for question answering and captioning

created 2 years ago
283 stars

Top 93.4% on sourcepulse

GitHubView on GitHub
Project Summary

MU-LLaMA is a language model designed for music understanding tasks, specifically answering questions about music and generating captions for music files to create Text-to-Music datasets. It targets researchers and developers working on multimodal AI for music, offering a novel approach to integrate audio context into large language models.

How It Works

MU-LLaMA builds upon the LLaMA-2 backbone, incorporating the MERT music encoder. An adapter layer is used to fuse music context information, guiding LLaMA's output for music-related question answering and captioning. MERT was selected after comparative analysis of various music representation models.

Quick Start & Requirements

  • Install: python gradio_app.py --model ./ckpts/checkpoint.pth --llama_dir ./ckpts/LLaMA
  • Prerequisites: Python 3.9.17, LLaMA-2 model weights (obtainable via HuggingFace), MU-LLaMA pretrained weights (downloadable).
  • Setup: Requires downloading LLaMA-2 and MU-LLaMA weights.
  • Demo: https://huggingface.co/spaces/shansongliu/MU-LLaMA

Highlighted Details

  • Achieves state-of-the-art results on music understanding benchmarks, outperforming models like LTU and LLaMA Adapter.
  • Provides code for generating a MusicQA dataset from MusicCaps and MagnaTagATune.
  • Supports both question answering and music captioning.
  • Leverages the MERT model for music representation.

Maintenance & Community

  • The project is associated with authors from academic institutions.
  • Citation details are provided in BibTeX format.

Licensing & Compatibility

  • The project's code is likely subject to the LLaMA-2 license. The README does not explicitly state a license for the MU-LLaMA code itself.

Limitations & Caveats

  • Generating the MusicQA dataset is computationally intensive, requiring approximately 8 days on a Tesla V100 GPU.
  • Requires obtaining LLaMA-2 model weights separately.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
3 more.

LLaMA-Adapter by OpenGVLab

0.0%
6k
Efficient fine-tuning for instruction-following LLaMA models
created 2 years ago
updated 1 year ago
Feedback? Help us improve.