InspireMusic  by FunAudioLLM

Toolkit for music, song, and audio generation

created 9 months ago
1,155 stars

Top 34.2% on sourcepulse

GitHubView on GitHub
Project Summary

InspireMusic is a comprehensive toolkit for generating high-fidelity music, songs, and audio, targeting researchers and developers in audio AI. It addresses the challenge of creating long-form, high-quality music with text and audio prompts, offering a unified framework for various generation tasks.

How It Works

InspireMusic employs a two-stage generation process. First, audio tokenizers convert raw audio into discrete tokens. An autoregressive transformer, based on Qwen2.5, predicts these tokens sequentially using both text and audio prompts. Second, a Super-Resolution Flow-Matching Model refines these tokens into high-resolution latent features, which are then converted to audio waveforms by a vocoder. This approach enables coherent, contextually relevant, and detailed audio generation.

Quick Start & Requirements

  • Installation: Clone the repository with git clone --recursive https://github.com/FunAudioLLM/InspireMusic.git. Install dependencies via pip install -r requirements.txt and conda install -y -c conda-forge pynini==2.1.5. Flash attention installation is recommended for speed.
  • Prerequisites: Python >= 3.8, PyTorch >= 2.0.1, flash attention (2.6.2/2.6.3), CUDA >= 11.8. sox or ffmpeg are recommended.
  • Hardware: For normal mode, at least 24GB GPU memory is recommended; 12GB is sufficient for fast mode.
  • Demo: InspireMusic Demo Page

Highlighted Details

  • Supports text-to-music, music continuation, reconstruction, and super-resolution.
  • Offers long-form music generation capabilities (up to several minutes).
  • Includes pre-trained models for 24kHz and 48kHz audio, with sizes up to 1.5B parameters.
  • Provides both a high-fidelity "normal" mode and a faster inference mode.

Maintenance & Community

The project is actively developed by FunAudioLLM. Community discussion is encouraged via DingTalk and WeChat groups.

Licensing & Compatibility

The repository is provided for research purposes. Specific licensing details for commercial use or closed-source linking are not explicitly stated in the README.

Limitations & Caveats

The toolkit currently focuses on music generation; InspireAudio and InspireSong models are listed as future work. Some examples may be sourced from the internet, with a disclaimer for content infringement.

Health Check
Last commit

2 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
88 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.