FunMusic  by FunAudioLLM

Toolkit for music, song, and audio generation

Created 10 months ago
1,197 stars

Top 32.6% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

InspireMusic is a comprehensive toolkit for generating high-fidelity music, songs, and audio, targeting researchers and developers in audio AI. It addresses the challenge of creating long-form, high-quality music with text and audio prompts, offering a unified framework for various generation tasks.

How It Works

InspireMusic employs a two-stage generation process. First, audio tokenizers convert raw audio into discrete tokens. An autoregressive transformer, based on Qwen2.5, predicts these tokens sequentially using both text and audio prompts. Second, a Super-Resolution Flow-Matching Model refines these tokens into high-resolution latent features, which are then converted to audio waveforms by a vocoder. This approach enables coherent, contextually relevant, and detailed audio generation.

Quick Start & Requirements

  • Installation: Clone the repository with git clone --recursive https://github.com/FunAudioLLM/InspireMusic.git. Install dependencies via pip install -r requirements.txt and conda install -y -c conda-forge pynini==2.1.5. Flash attention installation is recommended for speed.
  • Prerequisites: Python >= 3.8, PyTorch >= 2.0.1, flash attention (2.6.2/2.6.3), CUDA >= 11.8. sox or ffmpeg are recommended.
  • Hardware: For normal mode, at least 24GB GPU memory is recommended; 12GB is sufficient for fast mode.
  • Demo: InspireMusic Demo Page

Highlighted Details

  • Supports text-to-music, music continuation, reconstruction, and super-resolution.
  • Offers long-form music generation capabilities (up to several minutes).
  • Includes pre-trained models for 24kHz and 48kHz audio, with sizes up to 1.5B parameters.
  • Provides both a high-fidelity "normal" mode and a faster inference mode.

Maintenance & Community

The project is actively developed by FunAudioLLM. Community discussion is encouraged via DingTalk and WeChat groups.

Licensing & Compatibility

The repository is provided for research purposes. Specific licensing details for commercial use or closed-source linking are not explicitly stated in the README.

Limitations & Caveats

The toolkit currently focuses on music generation; InspireAudio and InspireSong models are listed as future work. Some examples may be sourced from the internet, with a disclaimer for content infringement.

Health Check
Last Commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
26 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral) and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

AudioLDM by haoheliu

0.1%
3k
Audio generation research paper using latent diffusion
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.