musicfm by minzwon

Enables comprehensive music analysis and representation via foundation model

Created 2 years ago

265 stars

Top 96.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Andreas Jansson

Cofounder of Replicate

Project Summary

Summary

MusicFM is a foundation model designed for music informatics, addressing the need for versatile audio representations applicable across various downstream tasks. It targets researchers and engineers in music AI, offering a powerful base for tasks like beat tracking, chord recognition, and music tagging, aiming to simplify and advance music analysis.

How It Works

MusicFM employs a masked token modeling approach, inspired by BEST-RQ, where input audio segments are masked, and the model reconstructs their representations. It utilizes a Conformer architecture, demonstrating superior performance over BERT-based models for music tasks. The model supports mixed precision and Flash attention for memory efficiency and can output both frame-level and sequence-level embeddings through adaptive or global average pooling, respectively.

Quick Start & Requirements

Installation: Clone the repository and configure the HOME_PATH environment variable.
Prerequisites: Python, PyTorch. A GPU is recommended for optimal performance.
Models: Download pretrained checkpoints using wget commands provided in the README:
- FMA version: fma_stats.json, pretrained_fma.pt
- MSD version (recommended for better performance): msd_stats.json, pretrained_msd.pt Note: Model checkpoints prior to February 13, 2024, were incorrect and require re-download.
Links: Paper.

Highlighted Details

Offers two pretrained versions: FMA (using Creative Commons licensed audio) and MSD (Million Song Dataset), with the MSD version showing superior performance.
Supports mixed precision and Flash attention for reduced memory footprint during inference.
Embeddings can be adapted for frame-level or sequence-level analysis using pooling strategies.
The Conformer architecture proves more effective than BERT for music informatics tasks.

Maintenance & Community

Contributors: Core development by Minz Won, Yun-Ning Hung, and Duc Le. Ju-Chiang Wang contributed to data refinement and evaluation code.
Updates: Model checkpoints were updated on February 13, 2024, due to an identified error.
Community: No explicit community channels (e.g., Discord, Slack) or roadmap are mentioned in the provided text.

Licensing & Compatibility

License: The repository's specific software license is not stated. The FMA dataset used for one model is Creative Commons licensed.
Compatibility: No explicit notes regarding commercial use or linking restrictions are provided. The absence of a clear software license is a significant adoption blocker.

Limitations & Caveats

Self-supervised foundation models, including MusicFM, exhibit limitations in inherent musical key detection, requiring fine-tuning to improve performance. The released model uses the FMA dataset to mitigate licensing issues; larger datasets could yield better results but are not released. Fine-tuned models for downstream tasks are not publicly available. The downstream evaluation pipeline is also not included in the repository. Fine-tuning requires careful learning rate management to avoid catastrophic forgetting and potential overfitting, as observed in the tagging task.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days