lp-music-caps  by seungheondoh

Music captioning research paper using LLMs

created 2 years ago
336 stars

Top 83.0% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

LP-MusicCaps provides a framework for generating descriptive captions for music, targeting researchers and developers in music information retrieval and AI-driven content creation. It offers two primary methods: generating captions from text tags using LLMs and training end-to-end models for audio-to-caption generation, aiming for human-level captioning quality.

How It Works

The project employs a two-stage approach. First, "Tag-to-Caption" leverages OpenAI's GPT-3.5 Turbo API to create detailed captions from user-provided music tags, enabling rich textual descriptions from metadata. Second, "Audio-to-Caption" involves training a cross-model encoder-decoder architecture. This stage first generates "pseudo captions" from audio and tags, then fine-tunes a transfer model on audio-pseudo caption pairs to achieve end-to-end music captioning directly from audio input.

Quick Start & Requirements

  • Installation: pip install -e .
  • Prerequisites: Python 3.10, PyTorch 1.13.1 (ensure CUDA compatibility), OpenAI API key.
  • Quick Start:
    • Tag-to-Caption: cd lpmc/llm_captioning && python run.py --prompt {writing, summary, paraphrase, attribute_prediction} --tags <music_tags>
    • Audio-to-Caption: cd demo && python app.py or cd lpmc/music_captioning && wget https://huggingface.co/seungheondoh/lp-music-caps/resolve/main/transfer.pth -O exp/transfer/lp_music_caps && python captioning.py --audio_path ../../dataset/samples/orchestra.wav
  • Resources: Pre-trained models and datasets are available on Huggingface.

Highlighted Details

  • Paper nominated for ISMIR Best Paper Award (5/104).
  • Invited for TISMIR journal publication.
  • Provides pre-trained models, transfer models, and a music/pseudo-caption dataset.
  • Includes a Huggingface demo for immediate interaction.

Maintenance & Community

The project is associated with authors from ISMIR 2023. Further details on community channels or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

  • License: CC-BY-NC 4.0 (Creative Commons Attribution-NonCommercial 4.0 International).
  • Compatibility: Non-commercial use only due to the NC clause.

Limitations & Caveats

The CC-BY-NC 4.0 license restricts commercial use. The "Tag-to-Caption" method relies on the availability and cost of the OpenAI GPT-3.5 Turbo API.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

AudioGPT by AIGC-Audio

0.1%
10k
Audio processing and generation research project
created 2 years ago
updated 1 year ago
Feedback? Help us improve.