FunMusic by FunAudioLLM

Toolkit for music, song, and audio generation

Created 1 year ago

1,285 stars

Top 30.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Luis Capelo

Cofounder of Lightning AI

Project Summary

InspireMusic is a comprehensive toolkit for generating high-fidelity music, songs, and audio, targeting researchers and developers in audio AI. It addresses the challenge of creating long-form, high-quality music with text and audio prompts, offering a unified framework for various generation tasks.

How It Works

InspireMusic employs a two-stage generation process. First, audio tokenizers convert raw audio into discrete tokens. An autoregressive transformer, based on Qwen2.5, predicts these tokens sequentially using both text and audio prompts. Second, a Super-Resolution Flow-Matching Model refines these tokens into high-resolution latent features, which are then converted to audio waveforms by a vocoder. This approach enables coherent, contextually relevant, and detailed audio generation.

Quick Start & Requirements

Installation: Clone the repository with git clone --recursive https://github.com/FunAudioLLM/InspireMusic.git. Install dependencies via pip install -r requirements.txt and conda install -y -c conda-forge pynini==2.1.5. Flash attention installation is recommended for speed.
Prerequisites: Python >= 3.8, PyTorch >= 2.0.1, flash attention (2.6.2/2.6.3), CUDA >= 11.8. sox or ffmpeg are recommended.
Hardware: For normal mode, at least 24GB GPU memory is recommended; 12GB is sufficient for fast mode.
Demo: InspireMusic Demo Page

Highlighted Details

Supports text-to-music, music continuation, reconstruction, and super-resolution.
Offers long-form music generation capabilities (up to several minutes).
Includes pre-trained models for 24kHz and 48kHz audio, with sizes up to 1.5B parameters.
Provides both a high-fidelity "normal" mode and a faster inference mode.

Maintenance & Community

The project is actively developed by FunAudioLLM. Community discussion is encouraged via DingTalk and WeChat groups.

Licensing & Compatibility

The repository is provided for research purposes. Specific licensing details for commercial use or closed-source linking are not explicitly stated in the README.

Limitations & Caveats

The toolkit currently focuses on music generation; InspireAudio and InspireSong models are listed as future work. Some examples may be sourced from the internet, with a disclaimer for content infringement.

Health Check

Last Commit

7 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

36 stars in the last 30 days