stable-audio-3 by Stability-AI

Audio generation platform for music and sound effects

Created 3 months ago

573 stars

Top 55.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Luis Capelo

Cofounder of Lightning AI

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Stable Audio 3 is an open platform for fast, high-quality audio and music generation, offering streamlined inference and fine-tuning. It targets researchers and power users seeking efficient tools for creating and editing audio content, benefiting from state-of-the-art models and flexible hardware support.

How It Works

This project leverages a new Semantic-Acoustic Music Encoder (SAME) autoencoder, supporting stereo, 44.1 kHz audio. It provides three core inference modes: text-to-audio, audio-to-audio editing, and inpainting/continuation. This design enables variable-length generation, efficient VRAM utilization, and personalization through stackable LoRA fine-tuning, optimizing both generative tractability and reconstruction quality.

Quick Start & Requirements

Primary install/run command: Install dependencies with uv sync. Run Gradio UI with uv run python run_gradio.py --model medium.
Non-default prerequisites: Python, uv package manager. CUDA 12.6+ is default for PyTorch; specific versions can be pinned. Flash Attention 2 is required for the medium model.
Links: Technical Report, Models, Extra Models, Discord, Demo, Blog Post.

Highlighted Details

Fast, state-of-the-art generation capable of producing minutes of audio in milliseconds.
Supports three distinct inference modes: text-to-audio, audio-to-audio editing, and inpainting/continuation.
Enables variable-length generation, optimizing inference time and VRAM usage.
Offers personalization via stackable LoRA fine-tuning, adaptable at runtime.
Broad hardware support includes CPU (Small models), CUDA/TensorRT (Medium), and Apple Silicon (CoreML).

Maintenance & Community

The project is associated with the Harmonai Discord server, which hosts discussions and weekly office hours on AI audio and music. The underfit tool by Dadabots is mentioned as an experimental option for advanced LoRA training.

Licensing & Compatibility

The project is released under the Stability AI Community License. Specific compatibility notes for commercial use or closed-source linking are not detailed in the README.

Limitations & Caveats

The 'Large' model is exclusively available via API and not supported by this repository. Stable Audio 3 Medium requires Flash Attention 2, and installation issues can lead to static glitch sounds. Troubleshooting Flash Attention installation is critical for the medium model's functionality.

Health Check

Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

99 stars in the last 30 days