lp-music-caps  by seungheondoh

Music captioning research paper using LLMs

Created 2 years ago
338 stars

Top 81.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

LP-MusicCaps provides a framework for generating descriptive captions for music, targeting researchers and developers in music information retrieval and AI-driven content creation. It offers two primary methods: generating captions from text tags using LLMs and training end-to-end models for audio-to-caption generation, aiming for human-level captioning quality.

How It Works

The project employs a two-stage approach. First, "Tag-to-Caption" leverages OpenAI's GPT-3.5 Turbo API to create detailed captions from user-provided music tags, enabling rich textual descriptions from metadata. Second, "Audio-to-Caption" involves training a cross-model encoder-decoder architecture. This stage first generates "pseudo captions" from audio and tags, then fine-tunes a transfer model on audio-pseudo caption pairs to achieve end-to-end music captioning directly from audio input.

Quick Start & Requirements

  • Installation: pip install -e .
  • Prerequisites: Python 3.10, PyTorch 1.13.1 (ensure CUDA compatibility), OpenAI API key.
  • Quick Start:
    • Tag-to-Caption: cd lpmc/llm_captioning && python run.py --prompt {writing, summary, paraphrase, attribute_prediction} --tags <music_tags>
    • Audio-to-Caption: cd demo && python app.py or cd lpmc/music_captioning && wget https://huggingface.co/seungheondoh/lp-music-caps/resolve/main/transfer.pth -O exp/transfer/lp_music_caps && python captioning.py --audio_path ../../dataset/samples/orchestra.wav
  • Resources: Pre-trained models and datasets are available on Huggingface.

Highlighted Details

  • Paper nominated for ISMIR Best Paper Award (5/104).
  • Invited for TISMIR journal publication.
  • Provides pre-trained models, transfer models, and a music/pseudo-caption dataset.
  • Includes a Huggingface demo for immediate interaction.

Maintenance & Community

The project is associated with authors from ISMIR 2023. Further details on community channels or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

  • License: CC-BY-NC 4.0 (Creative Commons Attribution-NonCommercial 4.0 International).
  • Compatibility: Non-commercial use only due to the NC clause.

Limitations & Caveats

The CC-BY-NC 4.0 license restricts commercial use. The "Tag-to-Caption" method relies on the availability and cost of the OpenAI GPT-3.5 Turbo API.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

Amphion by open-mmlab

0.2%
9k
Toolkit for audio, music, and speech generation research
Created 1 year ago
Updated 3 months ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Dan Abramov Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App), and
11 more.

jukebox by openai

0.1%
8k
Generative model for music research paper
Created 5 years ago
Updated 1 year ago
Feedback? Help us improve.