llark by spotify-research

Multimodal LLM research code for music-related tasks

Created 2 years ago

371 stars

Top 76.2% on SourcePulse

Project Summary

This repository provides the code for the LLark multimodal instruction-following language model for music, as detailed in the ICML 2024 paper. It enables researchers and developers to preprocess audio datasets, generate instruction-tuning data, extract embeddings, and evaluate music models, targeting advancements in AI-driven music understanding and generation.

How It Works

The project leverages Apache Beam for scalable data preprocessing, allowing for local execution or parallel processing on Google Cloud Dataflow. It integrates with OpenAI for instruction data generation and utilizes pre-trained Jukebox embeddings and CLAP models for audio feature extraction. The architecture is designed to facilitate the training and evaluation of multimodal music language models.

Quick Start & Requirements

Installation: All code should be run within provided Docker environments (m2t-train.dockerfile, m2t-preprocess.dockerfile, jukebox-embed.dockerfile).
Prerequisites: Google Cloud account and Dataflow setup may be required for large-scale data processing, potentially incurring costs. Docker image pushing to Google Artifact Registry is also mentioned.
Resources: Embedding ~100k audio files takes under 1 hour with default parameters.
Links: Paper Preprint, Spotify Research Blog, Companion Site, ICML Page (Note: actual links may vary).

Highlighted Details

Codebase supports preprocessing, instruction data generation, Jukebox/CLAP embedding extraction, and model evaluation.
Apache Beam pipelines are central to data processing, with options for local or cloud execution.
Includes scripts for training LLark, MPT-1B, and CLAP-based models, though official training support is limited.
Evaluation notebooks are provided for reproducibility, requiring access to specific music datasets.

Maintenance & Community

Authors include Josh Gardner and Peter Sobot.
Updates can be followed via @SpotifyResearch.
Spotify's Open Source Code of Conduct applies.

Licensing & Compatibility

Licensed under the Apache License, Version 2.0.
Contains subsets of code from LLaVA and jukemir.

Limitations & Caveats

This repository does not include trained models. The data preprocessing pipeline requires significant setup for Google Cloud Dataflow integration, including Docker image management and potential cost implications. Official training support is not provided, with scripts offered primarily for hyperparameter reference.

llark by spotify-research

Explore Similar Projects

MU-LLaMA by shansongliu

SongGen by LiuZH-19

mustango by AMAAI-Lab

ChatMusician by hf-lin

UniAudio by yangdongchao

sheetsage by chrisdonahue

FunMusic by FunAudioLLM

MusicGPT by gabotechs

SongGeneration by tencent-ailab

Kimi-Audio by MoonshotAI

ultravox by fixie-ai

audiocraft by facebookresearch