MU-LLaMA by shansongliu

Music understanding model for question answering and captioning

Created 2 years ago

299 stars

Top 89.1% on SourcePulse

Project Summary

MU-LLaMA is a language model designed for music understanding tasks, specifically answering questions about music and generating captions for music files to create Text-to-Music datasets. It targets researchers and developers working on multimodal AI for music, offering a novel approach to integrate audio context into large language models.

How It Works

MU-LLaMA builds upon the LLaMA-2 backbone, incorporating the MERT music encoder. An adapter layer is used to fuse music context information, guiding LLaMA's output for music-related question answering and captioning. MERT was selected after comparative analysis of various music representation models.

Quick Start & Requirements

Install: python gradio_app.py --model ./ckpts/checkpoint.pth --llama_dir ./ckpts/LLaMA
Prerequisites: Python 3.9.17, LLaMA-2 model weights (obtainable via HuggingFace), MU-LLaMA pretrained weights (downloadable).
Setup: Requires downloading LLaMA-2 and MU-LLaMA weights.
Demo: https://huggingface.co/spaces/shansongliu/MU-LLaMA

Highlighted Details

Achieves state-of-the-art results on music understanding benchmarks, outperforming models like LTU and LLaMA Adapter.
Provides code for generating a MusicQA dataset from MusicCaps and MagnaTagATune.
Supports both question answering and music captioning.
Leverages the MERT model for music representation.

Maintenance & Community

The project is associated with authors from academic institutions.
Citation details are provided in BibTeX format.

Licensing & Compatibility

The project's code is likely subject to the LLaMA-2 license. The README does not explicitly state a license for the MU-LLaMA code itself.

Limitations & Caveats

Generating the MusicQA dataset is computationally intensive, requiring approximately 8 days on a Tesla V100 GPU.
Requires obtaining LLaMA-2 model weights separately.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

1 stars in the last 30 days

Explore Similar Projects

llark by spotify-research

Multimodal LLM research code for music-related tasks

Created 2 years ago

Updated 1 year ago

mustango by AMAAI-Lab

Text-to-music generation research paper using multimodal LLMs

Created 2 years ago

Updated 7 months ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI).

ChatMusician by hf-lin

LLM for music understanding and generation

Created 2 years ago

Updated 1 year ago

UniAudio by yangdongchao

Audio foundation model for universal audio generation

Created 2 years ago

Updated 1 year ago

MuMu-LLaMA by shansongliu

Multi-modal model for music understanding and generation research

Created 2 years ago

Updated 1 year ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

lp-music-caps by seungheondoh

Music captioning research paper using LLMs

Created 2 years ago

Updated 1 year ago

dasheng-lm by xiaomi-research

Efficient audio understanding with general audio captions

Created 5 months ago

Updated 2 months ago

awesome-large-audio-models by EmulationAI

Curated list of Large Language Models in Audio AI

Created 2 years ago

Updated 2 months ago

Starred by

Robert Stojnic

Robert Stojnic(Cocreator of Papers with Code).

sheetsage by chrisdonahue

CLI tool for music lead sheet transcription

Created 3 years ago

Updated 8 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

MusicGPT by gabotechs

Music generation app for local LLM inference

Created 1 year ago

Updated 11 months ago

Starred by

Christian Laforte

Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

1 more.

Amphion by open-mmlab

Toolkit for audio, music, and speech generation research

Created 2 years ago

Updated 7 months ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Dan Abramov

Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App), and

11 more.

jukebox by openai

Generative model for music research paper

Created 5 years ago

Updated 1 year ago

Feedback? Help us improve.