llark  by spotify-research

Multimodal LLM research code for music-related tasks

created 1 year ago
357 stars

Top 79.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the code for the LLark multimodal instruction-following language model for music, as detailed in the ICML 2024 paper. It enables researchers and developers to preprocess audio datasets, generate instruction-tuning data, extract embeddings, and evaluate music models, targeting advancements in AI-driven music understanding and generation.

How It Works

The project leverages Apache Beam for scalable data preprocessing, allowing for local execution or parallel processing on Google Cloud Dataflow. It integrates with OpenAI for instruction data generation and utilizes pre-trained Jukebox embeddings and CLAP models for audio feature extraction. The architecture is designed to facilitate the training and evaluation of multimodal music language models.

Quick Start & Requirements

  • Installation: All code should be run within provided Docker environments (m2t-train.dockerfile, m2t-preprocess.dockerfile, jukebox-embed.dockerfile).
  • Prerequisites: Google Cloud account and Dataflow setup may be required for large-scale data processing, potentially incurring costs. Docker image pushing to Google Artifact Registry is also mentioned.
  • Resources: Embedding ~100k audio files takes under 1 hour with default parameters.
  • Links: Paper Preprint, Spotify Research Blog, Companion Site, ICML Page (Note: actual links may vary).

Highlighted Details

  • Codebase supports preprocessing, instruction data generation, Jukebox/CLAP embedding extraction, and model evaluation.
  • Apache Beam pipelines are central to data processing, with options for local or cloud execution.
  • Includes scripts for training LLark, MPT-1B, and CLAP-based models, though official training support is limited.
  • Evaluation notebooks are provided for reproducibility, requiring access to specific music datasets.

Maintenance & Community

  • Authors include Josh Gardner and Peter Sobot.
  • Updates can be followed via @SpotifyResearch.
  • Spotify's Open Source Code of Conduct applies.

Licensing & Compatibility

  • Licensed under the Apache License, Version 2.0.
  • Contains subsets of code from LLaVA and jukemir.

Limitations & Caveats

This repository does not include trained models. The data preprocessing pipeline requires significant setup for Google Cloud Dataflow integration, including Docker image management and potential cost implications. Official training support is not provided, with scripts offered primarily for hyperparameter reference.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.