Toolkit for music, song, and audio generation
Top 34.2% on sourcepulse
InspireMusic is a comprehensive toolkit for generating high-fidelity music, songs, and audio, targeting researchers and developers in audio AI. It addresses the challenge of creating long-form, high-quality music with text and audio prompts, offering a unified framework for various generation tasks.
How It Works
InspireMusic employs a two-stage generation process. First, audio tokenizers convert raw audio into discrete tokens. An autoregressive transformer, based on Qwen2.5, predicts these tokens sequentially using both text and audio prompts. Second, a Super-Resolution Flow-Matching Model refines these tokens into high-resolution latent features, which are then converted to audio waveforms by a vocoder. This approach enables coherent, contextually relevant, and detailed audio generation.
Quick Start & Requirements
git clone --recursive https://github.com/FunAudioLLM/InspireMusic.git
. Install dependencies via pip install -r requirements.txt
and conda install -y -c conda-forge pynini==2.1.5
. Flash attention installation is recommended for speed.sox
or ffmpeg
are recommended.Highlighted Details
Maintenance & Community
The project is actively developed by FunAudioLLM. Community discussion is encouraged via DingTalk and WeChat groups.
Licensing & Compatibility
The repository is provided for research purposes. Specific licensing details for commercial use or closed-source linking are not explicitly stated in the README.
Limitations & Caveats
The toolkit currently focuses on music generation; InspireAudio and InspireSong models are listed as future work. Some examples may be sourced from the internet, with a disclaimer for content infringement.
2 months ago
1+ week