Discover and explore top open-source AI tools and projects—updated daily.
minzwonEnables comprehensive music analysis and representation via foundation model
Top 99.3% on SourcePulse
Summary
MusicFM is a foundation model designed for music informatics, addressing the need for versatile audio representations applicable across various downstream tasks. It targets researchers and engineers in music AI, offering a powerful base for tasks like beat tracking, chord recognition, and music tagging, aiming to simplify and advance music analysis.
How It Works
MusicFM employs a masked token modeling approach, inspired by BEST-RQ, where input audio segments are masked, and the model reconstructs their representations. It utilizes a Conformer architecture, demonstrating superior performance over BERT-based models for music tasks. The model supports mixed precision and Flash attention for memory efficiency and can output both frame-level and sequence-level embeddings through adaptive or global average pooling, respectively.
Quick Start & Requirements
HOME_PATH environment variable.wget commands provided in the README:
fma_stats.json, pretrained_fma.ptmsd_stats.json, pretrained_msd.pt
Note: Model checkpoints prior to February 13, 2024, were incorrect and require re-download.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Self-supervised foundation models, including MusicFM, exhibit limitations in inherent musical key detection, requiring fine-tuning to improve performance. The released model uses the FMA dataset to mitigate licensing issues; larger datasets could yield better results but are not released. Fine-tuned models for downstream tasks are not publicly available. The downstream evaluation pipeline is also not included in the repository. Fine-tuning requires careful learning rate management to avoid catastrophic forgetting and potential overfitting, as observed in the tagging task.
2 years ago
Inactive
AI-Guru
microsoft
openai
facebookresearch