ChatMusician  by hf-lin

LLM for music understanding and generation

created 1 year ago
266 stars

Top 96.9% on sourcepulse

GitHubView on GitHub
Project Summary

ChatMusician is an open-source Large Language Model (LLM) designed to understand and generate music intrinsically, treating it as a second language. It targets researchers and developers interested in multimodal AI and creative language generation, offering a pure text-based approach to music composition and analysis without external multi-modal components.

How It Works

ChatMusician is built upon LLaMA2-7B, continually pre-trained and fine-tuned on ABC notation, a text-compatible music representation. This approach allows the model to process and generate music using only text tokenizers, simplifying the architecture and enabling seamless integration with existing LLM frameworks. The model is advantageous for its ability to compose structured, full-length music conditioned on various musical elements like text, chords, and melodies, while retaining strong general language capabilities.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python 3.8+, PyTorch 2.0+, CUDA 11.4+, DeepSpeed 0.10+. For audio demo: abcmidi and MuseScore.
  • Setup: Local inference requires downloading model weights and potentially pre-processing data.
  • Links: DemoPage, Code, Pretrain Dataset, SFT Dataset, Benchmark, arXiv

Highlighted Details

  • Composes well-structured, full-length music conditioned on text, chords, melodies, motifs, and musical forms.
  • Surpasses LLaMA2 and GPT-3.5 on the MusicTheoryBench benchmark in zero-shot settings.
  • Retains and slightly improves general language abilities, evidenced by MMLU scores.
  • Utilizes ABC notation as a text-compatible music representation for intrinsic LLM integration.

Maintenance & Community

The project was released on 2023-12-10, with active development indicated by recent updates. Community support is available via GitHub issues.

Licensing & Compatibility

The project is released under a permissive license, allowing for commercial use and integration with closed-source applications.

Limitations & Caveats

ChatMusician currently supports only strict format and closed-ended instructions for music tasks, with plans to improve generalization with more diverse data. It may suffer from hallucinations and is not recommended for music education without further refinement. A significant portion of the training data is in the style of Irish music, and in-context learning/chain-of-thoughts abilities are noted as weak.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.