Library for real-time music/audio generation using stable diffusion
Top 13.2% on sourcepulse
Riffusion (hobby) is a library for real-time music and audio generation using Stable Diffusion, targeting musicians, sound designers, and researchers. It enables the creation of audio from text prompts and spectrogram images, offering a novel approach to generative audio.
How It Works
Riffusion leverages Stable Diffusion models to generate audio by treating spectrograms as images. It employs prompt interpolation and image conditioning within the diffusion pipeline. This method allows for seamless transitions between musical styles or sounds by interpolating between different text prompts or conditioning on existing spectrograms, offering a unique way to explore and generate audio.
Quick Start & Requirements
python -m pip install -r requirements.txt
ffmpeg
is required for audio formats other than WAV.Highlighted Details
Maintenance & Community
This project is no longer actively maintained.
Licensing & Compatibility
The repository does not explicitly state a license. The associated website and model checkpoints may have different licensing terms.
Limitations & Caveats
The project is explicitly marked as "no longer actively maintained." While CPU is supported, it is noted as "quite slow" for real-time generation. MPS backend on Apple Silicon has potential fallbacks to CPU for audio processing and is not deterministic.
1 year ago
1 week